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1 CTGCAGCGTGCCCAGCTGTTCGTGGTGGTGATCGCGGCCGCGCTGGCCGCCGTCGCGGTC 
61 GCCGCCGCCGGGCCGATCGAGTTCGTCGCCTTCGTCGTGCCGCAGATCGCGCTGCGGCTC 
121 TGCGGCGGCAGCCGGCCGCCCCTGCTCGCCTCGGCGATGCTCGGCGCGCTGCTGGTGGTC 
181 GGCGCCGACCTGGTCGCTCAGATCGTGGTGGCGCCGAAGGAGCTGCCGGTCGGCCTGCTC 
241 ACCGCGATGATCGGCACCCCGTACCTGCTCTGGCTCCTGCTTCGGCGATCAAGAAAGGTG 
301 AGCGGATGAACGCCCGCCTGCGTGGCGAGGGCCTGCACCTCGCGTACGGGGACCTGACCG 
361 TGATCGACGGCCTCGACGTCGACGTGCACGACGGGCTGGTCACCACCATCATCGGGCCCA 
421 ACGGGTGCGGCAAGTCGACGCTGCTCAAGGCGCTCGGCCGGCTGCTGCGCCCGACCGGCG 
481 GGCAGGTGCTGCTGGACGGCCGCCGCATCGACCGGACCCCCACCCGTGAOSTGGCCCGGG 
541 TGCTCGGCGTGCTGCCGCAGTCGCCCACCGCGCCCGAAGGGCTCACCGTCGCCGACCTGG 
601 TGATGCGCGGCCGGCACCCGCACCAGACCTGGTTCCGGCAGTGGTCGCGCGACGACGAGG 
661 ACCAGGTCGCCGACGCGCTGCGCTGGACCGACATGCTGGCGTACGCGGACCGCCCGGTGG 
721 ACGCCCTCTCCGGCGGTCAGCGCCAGCGCGCCTGGATCAGCATGGCGCTGGCCCAGGGCA 
781 CCGACCTGCTGCTGCTGGACGAGCCGACCACCTTCCTCGACCTGGCCCACCAGATCGACG 
841 TGCTGGACCTGGTCCGCCGGCTGCACGCCGAGATGGGCCGGACCGTGGTGATGGTGCTGC 
901 ACGACCTGAGCCTGGCCGCCCGGTACGCCGACCGGCTGATCGCGATGAAGGACGGCCGGA 
961 TCGTGGCGAGCGGGGCGCCGGACGAGGTGCTCACCCCGGCGCTGCTGGAGTCGGTCTTCG 
1021 GGCTGCGCGCGATGGTGGTGCCCGACCCGGCGACCGGCACCCCGCTGGTGATCCCCCTGC 
1081 CGCGCCCCGCCACCTCGGTGCGGGCCTGAAATCGATGAGCGTGGTTGCTTCATCGGCCTG 
1141 CCGAGCGATGAGAGTATGTGGGCGGTAGAGCGAGTCTCGAGGGGGAGATGCCGCCGTGAC 

V T 

1201 GTCCTCGTACATGCGCCTGAAAGCAGCAGCGATCGCCTTCGGTGTGATCGTGGCGACCGC 
3 SSYMRLKAAAIAFGVIVATA 

1261 AGCCGTGCCGTCACCCGCTTCCGGCAGGGAACATGACGGCGGCTATGCGGCCCTGATCCG 
23 AVPSPASGREHDGGYAALIR 

13 21 CCGGGCCTCGTACGGCGTCCCGCACATCACCGCCGACGACTTCGGGAGCCTCGGTTTCGG 
43 RASYGVPHITADDFGSLGFG 

13 81 CGTCGGGTACGTGCAGGCCGAGGACAACATCTGCGTCATCGCCGAGAGCGTAGTGACGGC 
63 VGYVQAEDNICVIAESVVTA 

1441 CAAGGGTGAGCGGTCGCGGTGGTTCGGTGCGACCGGGCCGGACGACGCCGATGTGCGCAG 
83 NGERSRWFGATGPDDADVRS 


FIG. 13A 
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1501 CGACCTCTTCCACCGCAAGGCGATCGACGACCGCGTCGCCGAGCGGCTCCTCGAAGGGCC 
103 DLFHRKAIDDRVAERLLEGP 

1561 CCGCGACGGCGTGCGGGCGCCGTCGGACGACGTCCGGGACCAGATGCGCGGCTTCGTCGC 
123 RDGVRAPSDDVRDQMRGFVA 

1621 CGGCTACAACCACTTCCTACGCCGCACCGGCGTGCACCGCCTGACCGACCCGGCGTGCCG 
143 GYNHFLRRTGVHRLTDPACR 

1681 CGGCAAGGCCTGGGTGCGCCCGCTCTCCGAGATCGATCTCTGGCGTACGTCGTGGGACAG 
163 GKAWVRPLSEIDLWRTSWDS 

1741 CATGGTCCGGGCCGGTTCCGGGGCGCTGCTCGACGGCATCGTCGCCGCGACGCCACCTAC 
183 MVRAGSGALLDGIVAATPPT 

1801 AGCCGCCGGGCCCGCGTCAGCCCCGGAGGCACCCGACGCCGCCGCGATCGCCGCCGCCCT 
203 AAGPASAPEAPDAAAIAAAL 

1861 CGACGGGACGAGCGCGGGCATCGGCAGCAACGCGTACGGCCTCGGCGCGCAGGCCACCGT 
223 DGTSAGIGSNAYGLGAQATV 

1921 GAACGGCAGCGGGATGGTGCTGGCCAACCCGCACTTCCCGTGGCAGGGCGCCGCACGCTT 
243 NGSGMVLANPHFPWQGAARF 

1981 CTACCGGATGCACCTCAAGGTGCCCGGCCGCTACGACGTCGAGGGCGCGGCGCTGATCGG 
263 YRMHLKVPGRYDVEGAALIG 

2041 CGACCCGATCATCGGGATCGGGCACAACCGCACGGTCGCCTGGAGCCACACCGTCTCCAC 
283 DPI IGIGHNRTVAWSHTVST 

2101 CGCCCGCCGGTTCGTGTGGCACCGCCTGAGCCTCGTGCCCGGCGACCCCACCTCCTATTA 
303 ARRFVWHRLSLVPGDPTSYY 

2161 CGTCGACGGCCGGCCCGAGCGG ATGCGCGCCCGCACGGTCACGGTCCAGACCGGCAGCGG 
323 VDGRPERMRARTVTVQTGSG 


FIG. 13A 
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2221 CCCGGTCAGCCGCACCTTCCACGACACCCGCTACGGCCCGGTGGCCGTGATGCCGGGCAC 
343 PVSRTFHDTRYGPVAVMPGT 

2281 CTTCGACTGGACGCCGGCCACCGCGTACGCCATCACCGACGTCAACGCGGGCAACAACCG 
363 FDWTPATAYAITDVNAGNNR 

2341 CGCCTTCGACGGGTGGCTGCGGATGGGCCAGGCCAAGGACGTCCGGGCGCTCAAGGCGGT 
383 AFDGWLRMGQAKDVRALKAV 

2401 CCTCGACCGGCACCAGTTCCTGCCCTGGGTCAACGTGATCGCCGCCGACGCGCGGGGCGA 
403 LDRHQFLPWVNVIAADARGE 

2461 GGCCCTCTACGGCGATCATTCGGTCGTCCCCCGGGTGACCGGCGCGCTCGCTGCCGCCTG 
423 ALYGDHSVVPRVTGALAAAC 

2521 CATCCCGGCGCCGTTCCAGCGGCTCTACGCCTCCAGCGGCCAGGCGGTCCTGGACGGTTC 
443 I PAPFQPLYASSGQAVLDGS 

2581 CCGGTCGGACTGCGCGCTCGGCGCCGACCCCGACGCCGCGGTCCCGGGCATTCTCGGCCC 
463 RSDCALGADPDAAVPGILGP 

2641 GGCGAGCCTGCCGGTGCGGTTCCGCGACGACTACGTCACCAACTCCAACGACAGTCACTG 
483 ASLPVRFRDDYVTNSNDSHW 

2701 GCTGGCCAGCCCGGCCGCCCCGCTGGAAGGCTTCCCGCGGATCCTCGGCAACGAACGCAC 
503 LASPAAPLEGFPRILGNERT 

2761 CCCGCGCAGCCTGCGCACCCGGCTCGGGCTGGACCAGATCCAGCAGCGCCTCGCCGGCAC 
523 PRSLRTRLGLDQIQQRLAGT 

2821 GGACGGTCTGCCCGGCAAGGGCTTCACCACCGCCCGGCTCTGGCAGGTCATGTTCGGCAA 
543 DGLPGKGFTTARLWQVMFGN 

2881 CCGGATGCACGGCGCCGAACTCGCCCGCGACGACCTGGTCGCGCTCTGCCGCCGCCAGCC 
563 RMHGAELARDDLVALCRRQP 
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2941 GACCGCGACCGCCTCGAACGGCGCGATCGTCGACCTCACCGCGGCCTGCACGGCGCTGTC 
583 TATASNGAIVDLTAACTALS 

3001 CCGCTTCGATGAGCGTGCCGACCTGGACAGCCGGGGCGCGCACCTGTTCACCGAGTTCGC 
603 RFDERADLDSRGAHLFTEFA 

3061 CCTCGCGGGCGGAATCAGGTTCGCCGACACCTTCGAGGTGACCGATCCGGTACGCACCCC 
623 LAGGIRFADTFEVTDPVRTP 

3121 GCGCCGTCTGAACACCACGGATCCGCGGGTACGGACGGCGCTCGCCGACGCCGTGCAACG 
643 RRLNTTDPRVRTALADAVQR 

3181 GCTCGCCGGCATCCCCCTCGACGCGAAGCTGGGAGACATCCACACCGACAGCCGCGGCGA 
663 LAGIPLDAKLGDIHTDSRGE 

3241 ACGGCGCATCCCCATCCACGGTGGCCGCGGGGAAGCAGGCACCTTCAACGTGATCACCAA 
683 RRIPIHGGRGEAGTFNVITN 

3301 CCCGCTCGTGCCGGGCGTGGGATACCCGCAGGTCGTCCACGGAACATCGTTCGTGATGGC 
703 plvpgvgypqvvh gtsfvma 

3361 CGTCGAACTCGGCCCGCACGGCCCGTCGGGACGGCAGATCCTCACCTATGCGCAGTCGAC 
723 VELGPHGPSGRQILTYAQST 

3421 GAACCCGAACTCACCCTGGTACGCCGACCAGACCGTGCTCTACTCGCGGAAGGGCTGGGA 
743 NPNSPWYADQTVLYSRKGWD 

3481 CACCATCAAGTACACCGAGGCGCAGATCGCGGCCGACCCGAACCTGCGCGTCTACCGGGT 
763 T I KYTEAQIAADPNLRVYRV 

3541 GGCACAGCGGGGACGCTGACCCACGTCACGCCGGCTCGGCCCGTGCGGGGGCGCAGGGCG 
783 A Q R G R * 
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3601 CCGATCGTCTCTGCATCGCCGGTCAGCCGGGGCCTGCGTCGACCGGCGGCGGCCGGTCGA 
3661 CGCCCGCGTCCCGGCGCAGCGACTGGCTGAAGCGCCAGGCGTCGGCGGCCCGGGGCAGGT 
3721 TGTTGAACATCACGTACGCCGGGCCGCCGTCGAGGATGCCGGCGAGGTGTGCCAGCTCGG 
3701 CATCCGTGTACACATGCCGGGCGCCGGTGATGCCGTGCAGCCGGTAATAGGCCATCGGCG 
3841 TCAGACTGCGGCGCAGGAACGGGTCGGCGGCGTGGGTCAGGTCCAGCTCCTGGCACAAGC 
3901 CCTCGACCACCTCGTCCGGCCACGGGCCGCGCGGCTCCCACAACAGCCGGACACCGGCCG 
3961 GCCGGCGCGCTCGGGCGCAGAACTCACGCAGTCGCGCGATGGCGGGTTCGGTCGGCCGGA 
4021 AACTCGCCGGGCACTGC AG 


FIG. 13A 
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METHOD FOR CREATING 
POLYNUCLEOTIDE AND POLYPEPTIDE 
SEQUENCES 

CROSS-REFERENCES TO RELATED 
APPLICATIONS 

This application derives priority from U.S. Ser. No. 
60/067,908, filed Dec. 8, 1997, which is incorporated by 
reference in its entirety for all purposes. 

STATEMENT OF GOVERNMENT INTEREST 

The invention described herein was made in the perfor- 
mance of work under a NASA contract, and is subject to the 
provisions of Public Law 96-517 (35 USC §202) in which 
the contractor has elected to retain title. 

COPYRIGHT NOTICE 

A portion of the disclosure of this patent document 
contains material which is subject to copyright protection. 
The copyright owner has no objection to the facsimile 
reproduction by anyone of the patent document or the patent 
disclosure as it appears in the Patent and Trademark Office 
patent file or records, but otherwise reserves all copyright 
rights whatsoever. 

TECHNICAL FIELD 

The invention resides in the technical field of genetics, 
and more specifically, forced molecular evolution of poly- 
nucleotides to acquire desired properties. 

BACKGROUND 

A variety of approaches, including rational design and 
directed evolution, have been used to optimize protein 
functions (1, 2). The choice of approach for a given opti- 
mization problem depends, in part, on the degree of under- 
standing of the relationships between sequence, structure 
and function. Rational redesign typically requires extensive 
knowledge of a structure -function relationship. Directed 
evolution requires little or no specific knowledge about 
structure -function relationship; rather, the essential features 
is a means to evaluate the function to be optimized. Directed 
evolution involves the generation of libraries of mutant 
molecules followed by selection or screening for the desired 
function. Gene products which show improvement with 
respect to the desired property or set of properties are 
identified by selection or screening. The gene(s) encoding 
those products can be subjected to further cycles of the 
process in order to accumulate beneficial mutations. This 
evolution can involve few or many generations, depending 
on how far one wishes to progress and the effects of 
mutations typically observed in each generation. Such 
approaches have been used to create novel functional 
nucleic acids (3, 4), peptides and other small molecules (3), 
antibodies (3), as well as enzymes and other proteins (5, 6, 
7). These procedures are fairly tolerant to inaccuracies and 
noise in the function evaluation (7). 

Several publications have discussed the role of gene 
recombination in directed evolution (see WO 97/07205, WO 
98/42727, U.S. Pat. No. 5,807,723, U.S. Pat. No. 5,721,367, 
U.S. Pat, No. 5,776,744 and WO 98/41645 U.S. Pat. No. 
5,811,238, WO 98/41622, WO 98/41623, and U.S. Pat. No. 
5,093,257). 

APCR-based group of recombination methods consists of 
DNA shuffling [5, 6], staggered extension process [89, 90] 
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and random-priming recombination [87]. Such methods 
typically involve synthesis of significant amounts of DNA 
during assembly/recombination step and subsequent ampli- 
fication of the final products and the efficiency of amplifi- 
5 cation decreases with gene size increase. 

Yeast cells, which possess an-active system for homolo- 
gous recombination, have been used for in vivo recombina- 
tion. Cells transformed with a vector and partially overlap- 
ping inserts efficiently join the inserts together in the regions 
of homology and restore a functional, covalently-closed 
plasmid [91]. This method does not require PCR amplifica- 
tion at any stage of recombination and therefore is free from 
the size considerations inherent in this method. However, the 
number of crossovers introduced in one recombination event 
15 is limited by the efficiency of transformation of one cell with 
multiple inserts. Other in vivo recombination methods entail 
recombination between two parental genes cloned on the 
same plasmid in a tandem orientation. One method relies on 
homologous recombination machinery of bacterial cells to 
20 produce chimeric genes [92]. A first gene in the tandem 
provides the N-terminal part of the target protein, and a 
second provides the C-terminal part. However, only one 
crossover can be generated by this approach. Another in vivo 
recombination method uses the same tandem organization of 
25 substrates in a vector [93]. Before transformation into E. coli 
cells, plasmids are linearized by endonuclease digestion 
between the parental sequences. Recombination is per- 
formed in vivo by the enzymes responsible for double-strand 
break repair. The ends of linear molecules are degraded by 
50 a 5"3' exonuclease activity, followed by annealing of 
complementary single -strand 3' ends and restoration of the 
double -strand plasmid [94]. This method has similar advan- 
tages and disadvantages of tandem recombination on circu- 
lar plasmid. 

35 

SUMMARY OF THE INVENTION 

The invention provides methods for evolving a polynucle- 
otide toward acquisition of a desired property. Such methods 
entail incubating a population of parental polynucleotide 
variants under conditions to generate annealed polynucle- 
otides comprises heteroduplexes. The heteroduplexes are 
then exposed to a cellular DNA repair system to convert the 
heteroduplexes to parental polynucleotide variants or 
recombined polynucleotide variants. The resulting poly- 
nucleotides are then screened or selected for the desired 
property. 

In some methods, the heteroduplexes are exposed to a 
DNA repair system in vitro. A suitable repair system can be 
5 Q prepared in the form of cellular extracts. 

In other methods, the products of annealing including 
heteroduplexes are introduced into host cells. The hetero- 
duplexes are thus exposed to the host cells’ DNA repair 
system in vivo. 

55 In several methods, the introduction of annealed products 
into host cells selects for heteroduplexes relative to trans- 
formed cells comprising homoduplexes. Such can be 
achieved, for example, by providing a first polynucleotide 
variant as a component of a first vector, and a second 
60 polynucleotide variant is provided as a component of a 
second vector. The first and second vectors are converted to 
linearized forms in which the first and second polynucle- 
otide variants occur at opposite ends. In the incubating step, 
single-stranded forms of the first linearized vector reanneal 
65 with each other to form linear first vector, single -stranded 
forms of the second linearized vector re anneal with each 
other to form linear second vector, and single -stranded 



US 6,537,746 B2 


3 

linearized forms of the first and second vectors anneal with 
each to form a circular heteroduplex bearing a nick in each 
strand. Introduction of the products into cells thus selects for 
cirular heteroduplexes relative to the linear first and second 
vector. Optionally, in the above methods, the first and second 
vectors can be converted to linearized forms by PCR. 
Alternatively, the first and second vectors can be converted 
to linearized forms by digestion with first and second 
restriction enzymes. 

In some methods, polynucleotide variants are provided in 
double stranded form and are converted to single stranded 
form before the annealing step. Optionally, such conversion 
is by conducting asymmetric amplification of the first and 
second double stranded polynucleotide variants to amplify a 
first strand of the first polynucleotide variant, and a second 
strand of the second polynucleotide variant. The first and 
second strands anneal in the incubating step to form a 
heteroduplex. 

In some methods, a population of polynucleotides com- 
prising first and second polynucleotides is provided in 
double stranded form, and the method further comprises 
incorporating the first and second polynucleotides as com- 
ponents of first and second vectors, whereby the first and 
second polynucleotides occupy opposite ends of the first and 
second vectors. In the incubating step single-stranded forms 
of the first linearized vector reanneal with each other to form 
linear first vector, single-stranded forms of the second 
linearized vector reanneal with each other to form linear 
second vector, and single -stranded linearized forms of the 
first and second vectors anneal with each to form a circular 
heteroduplex bearing a nick in each strand. In the introduc- 
ing step selects for transformed cells comprises the circular 
heteroduplexes relative to the linear first and second vector. 

In some methods, the first and second polynucleotides are 
obtained from chromosomal DNA. In some methods, the 
polynucleotide variants encode variants of a polypeptide. In 
some methods, the population of polynucleotide variants 
comprises at least 20 variants. In some methods, the popu- 
lation of polynucleotide variants are at least 10 kb in length. 

In some methods, the polynucleotide variants comprises 
natural variants. In other methods, the polynucleotide vari- 
ants comprise variants generated by mutagenic PCR or 
cassette mutagenesis. In some methods, the host cells into 
which heteroduplexes are introduced are bacterial cells. In 
some methods, the population of variant polynucleotide 
variants comprises at least 5 polynucleotides having at least 
90% sequence identity with one another. 

Some methods further comprise a step of at least partially 
demethylating variant polynucleotides. Demethyl ation can 
be achieved by PCR amplification or by passaging variants 
through methylation-deficient host cells. 

Some methods include a further step of sealing one or 
more nicks in heteroduplex molecules before exposing the 
heteroduplexes to a DNA repair system. Nicks can be sealed 
by treatment with DNA ligase. 

Some methods further comprise a step of isolating a 
screened recombinant polynucleotide ariant. In some 
methods, the polynucleotide variant is screened to produce 
a recombinant protein or a secondary metabolite whose 
production is catalyzed thereby. 

In some methods, the recombinant protein or secondary 
metabolite is formulated with a carrier to form a pharma- 
ceutical composition. 

In some methods, the polynucleotide variants encode 
enzymes selected from the group consisting of proteases, 
lipases, amylases, cutinases, cellulases, amylases, oxidases. 
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peroxidases and phytases. In other methods, the polynucle- 
otide variants encode a polypeptide selected from the group 
consisting of insulin, ACTH, glucagon, somatostatin, 
somatotropin, thymosin, parathyroid hormone, pigmentary 
5 hormones, somatomedin, erthropoietin, luteinizing 
hormone, chorionic gonadotropin, hyperthalmic releasing 
factors, antidiuretic hormones, thyroid stimulating hormone, 
relaxin, interferon, thrombopoietic (TPO), and prolactin. 

In some methods, each polynucleotide in the population 
of variant polynucleotides encodes a plurality of enzymes 
forming a metabolic pathway. 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 illustrates the process of heteroduplex formation 
using polymerase chain reaction (PCR) with one set of 
primers for each different sequence to amplify the target 
sequence and vector. 

FIG. 2 illustrates the process of heteroduplex formation 
20 using restriction enzymes to linearize the target sequences 
and vector. 

FIG. 3 illustrates a process of heteroduplex formation 
using asymmetric or single primer polymerase chain reac- 
tion (PCR) with one set of primers for each different 
25 sequence to amplify the target sequence and vector. 

FIG. 4 illustrates heteroduplex recombination using 
unique restriction enzymes (X and Y) to remove the homo- 
duplexes. 

FIG. 5 shows the amino acid sequences of the FlaAfrom 
R. lupini (SEQ ID NO: 1) and R. meliloti (SEQ ID NO:2). 

FIG. 6 shows the locations of the unique restriction sites 
utilized to linearize pRL20 and pRM40. 

FIGS. 7 A, B, C and D show the DNA sequences of four 
35 mosaic fiaA genes created by in vitro heteroduplex forma- 
tion followed by in vivo repair ((a) is SEQ ID NO: 3, (b) is 
SEQ ID NO:4, (c) is SEQ ID NO:5 and (d) is SEQ ID NO: 6 ). 

FIG. 8 illustrates how the heteroduplex repair process 
created mosaic fiaA genes containing sequence information 
40 from both parent genes. 

FIG. 9 shows the physical maps of Actinoplanes utahensis 
ECB deacylase mutants with enhanced specific activity ((a) 
is pM7-2 for Mutant 7-2, and (b) is pM16 for Mutant 16). 

FIG. 10 illustrates the process used for Example 2 to 
45 recombine mutations in Mutant 7-2 and Mutant 16 to yield 
ECB deacylase recombinant with more enhanced specific 
activity. 

FIG. 11 Specific activities of wild-type ECB deacylase 
and improved mutants Mutant 7-2, Mutant 16 and recom- 
bined Mutant 15. 

FIG. 12. Positions of DNA base changes and amino acid 
substitutions in recombined ECB deacylase Mutant 15 with 
respect to parental sequences of Mutant 7-2 and Mutant 16. 

FIGS. 13 A, B, C, D and E show the DNA sequence of 
A.utahensis ECB deacylase gene mutant M-15 genes created 
by in vitro heteroduplex formation followed by in vivo 
repair (SEQ ID NO:7). 

FIG. 14 illustrates the process used for Example 3 to 
60 recombine mutations in RCl and RC2 to yield thermostable 
subtilisin E. 

FIG. 15 illustrates the sequences of RCl and RC2 and the 
ten clones picked randomly from the transformants of the 
reaction products of duplex formation as described in 
65 Example 3. The x’s correspond to base positions that differ 
between RCl and RC2. The mutation at 995 corresponds to 
amino acid substitution at 181, while that at 1107 corre- 



US 6,537,746 B2 


5 

spends to an amino acid substitution at 218 in the subtilisin 
protein sequence. 

FIG. 16 shows the results of screening 400 clones from 
the library created by heteroduplex formation and repair for 
initial activity (A-) and residual activity (A^). The ratio A/A^ 5 
was used to estimate the enzymes’ thermostability. Data 
from active variants are sorted and plotted in descending 
order. Approximately 12.9% of the clones exhibit a pheno- 
type corresponding to the double mutant containing both the 
N181D and the N218S mutations. lo 

DEFINITIONS 

Screening is, in general, a two-step process in which one 
first physically separates the cells and then determines which 
cells do and do not possess a desired property. Selection is 
a form of screening in which identification and physical 
separation are achieved simultaneously by expression of a 
selection marker, which, in some genetic circumstances, 
allows cells expressing the marker to survive while other 
cells die (or vice versa). Exemplary screening members 
include luciferase, pgalactosidase and green fluorescent 
protein. Selection markers include drug and toxin resistance 
genes. Although spontaneous selection can and does occur 
in the course of natural evolution, in the present methods 
selection is performed by man. 

An exogenous DNA segment is one foreign (or 
heterologous) to the cell or homologous to the cell but in a 
position within the host cell nucleic acid in which the 
element is not ordinarily found. Exogenous DNA segments 
are expressed to yield exogenous polypeptides. 

The term gene is used broadly to refer to any segment of 
DNA associated with a biological function. Thus, genes 
include coding sequences and/or the regulatory sequences 
required for their expression. Genes also include nonex- 
pressed DNA segments that, for example, form recognition 
sequences for other proteins. 

The term “wild-type” means that the nucleic acid frag- 
ment does not comprise any mutations. A “wild-type” pro- 
tein means that the protein will be active at a level of activity 40 
found in nature and typically will comprise the amino acid 
sequence found in nature. In an aspect, the term “wild type” 
or “parental sequence” can indicate a starting or reference 
sequence prior to a manipulation of the invention. 

“Substantially pure” means an object species is the pre- 45 
dominant species present (i.e., on a molar basis it is more 
abundant than any other individual macromolecular species 
in the composition), and preferably a substantially purified 
fraction is a composition wherein the object species com- 
prises at least about 50 percent (on a molar basis) of all 50 
macromolecular species present. Generally, a substantially 
pure composition will comprise more than about 80 to 90 
percent of all macromolecular species present in the com- 
position. Most preferably, the object species is purified to 
essential homogeneity (contaminant species cannot be 55 
detected in the composition by conventional detection 
methods) wherein the composition consists essentially of a 
single macromolecular species. Solvent species, small mol- 
ecules (<500 Daltons), and elemental ion species are not 
considered macromolecular species. 60 

Percentage sequence identity is calculated by comparing 
two optimally aligned sequences over the window of 
comparison, determining the number of positions at which 
the identical nucleic acid base occurs in both sequences to 
yield the number of matched positions, dividing the number 65 
of matched positions by the total number of positions in the 
window of comparison. Optimal alignment of sequences for 
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aligning a comparison window can be conducted by com- 
puterized implementations of algorithms GAP, BESTFIT, 
FASTA, and TFASTA in the Wisconsin Genetics Software 
Package Release 7.0, Genetics Computer Group, 575 Sci- 
ence Dr., Madison, Wis. 

The term naturally-occurring is used to describe an object 
that can be found in nature as distinct from being artificially 
produced by man. For example, a polypeptide or polynucle- 
otide sequence that is present in an organism (including 
viruses) that can be isolated from a source in nature and 
which has not been intentionally modified by man in the 
laboratory is naturally-occurring. Generally, the term 
naturally-occurring refers to an object as present in a non- 
patho logical (undiseased) individual, such as would be 
typical for the species. 

A nucleic acid is operably linked when it is placed into a 
functional relationship with another nucleic acid sequence. 
For instance, a promoter or enhancer is operably linked to a 
coding sequence if it increases the transcription of the 
coding sequence. Operably linked means that the DNA 
sequences being linked are typically contiguous and, where 
necessary to join two protein coding regions, contiguous and 
in reading frame. However, since enhancers generally func- 
tion when separated from the promoter by several kilobases 
and intronic sequences may be of variable lengths, some 
polynucleotide elements may be operably linked but not 
contiguous. 

A specific binding affinity between, for example, a ligand 
and a receptor, means a binding affinity of at least 1x10^ 

The term “cognate” as used herein refers to a gene 
sequence that is evolutionarily and functionally related 
between species. For example but not limitation, in the 
human genome, the human CD4 gene is the cognate gene to 
the mouse CD4 gene, since the sequences and structures of 
these two genes indicate that they are highly homologous 
and both genes encode a protein which functions in signal- 
ing T cell activation through MHC class Il-restricted antigen 
recognition. 

The term “heteroduplex” refers to hybrid DNA generated 
by base pairing between complementary single strands 
derived from the different parental duplex molecules, 
whereas the term “homoduplex” reters to double -stranded 
DNA generated by base pairing between complementary 
single strands derived from the same parental duplex mol- 
ecules. 

The term “nick” in duplex DNA refers to the absence of 
a phosphodiester bond between two adjacent nucleotides on 
one strand. The term “gap” in duplex DNA refers to an 
absence of one or more nucleotides in one strand of the 
duplex. The term “loop” in duplex DNA refers to one or 
more unpaired nucleotides in one strand. 

A mutant or variant sequence is a sequence showing 
substantial variation from a wild type or reference sequence 
that differs from the wild type or reference sequence at one 
or more positions. 

DETAILED DESCRIPTION 
I. General 

The invention provides methods of evolving a polynucle- 
otide toward acquisition of a desired property. The substrates 
for the method are a population of at least two polynucle- 
otide variant sequences that contain regions of similarity 
with each other but, which also have point(s) or regions of 
divergence. The substrates are annealed in vitro at the 
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regions of similarity. Annealing can regenerate initial sub- 
strates or can form heteroduplexes, in which component 
strands originate from different parents. The products of 
annealing are exposed to enzymes of a DNA repair, and 
optionally a replication system, that repairs unmatched 
pairings. Exposure can be in vivo as when annealed products 
are transformed into host cells and exposed to the hosts DNA 
repair system. Alternatively, exposure can be in vitro, as 
when annealed products are exposed to cellular extracts 
containing functional DNA repair systems. Exposure of 
heteroduplexes to a DNA repair system results in DNA 
repair at bulges in the heteroduplexes due to DNA mis- 
matching. The repair process differs from homologous 
recombination in promoting nonreciprocal exchange of 
diversity between strands. The DNA repair process is typi- 
cally effected on both component strands of a heteroduplex 
molecule and at any particular mismatch is typically random 
as to which strand is repaired. The resulting population can 
thus contain recombinant polynucleotides encompassing an 
essentially random reassortment of points of divergence 
between parental strands. The population of recombinant 
polynucleotides is then screened for acquisition of a desired 
property. The property can be a property of the polynucle- 
otide per se, such as capacity of a DNA molecule to bind to 
a protein or can be a property of an expression product 
thereof, such as mRNA or a protein. 

II. Substrates For Shuffling 

The substrates for shuffling are variants of a reference 
polynucleotide that show some region(s) of similarity with 
the reference and other region(s) or point(s) of divergence. 
Regions of similarity should be sufficient to support anneal- 
ing of polynucleotides such that stable heteroduplexes can 
be formed. Variants forms often show substantial sequence 
identity with each other (e.g., at least 50%, 75%, 90% or 
99%). There should be at least sufficient diversity between 
substrates that recombination can generate more diverse 
products than there are starting materials. Thus, there must 
be at least two substrates differing in at least two positions. 
The degree of diversity depends on the length of the sub- 
strate being recombined and the extent of the functional 
change to be evolved. Diversity at between 0.1-25% of 
positions is typical. Recombination of mutations from very 
closely related genes or even whole sections of sequences 
from more distantly related genes or sets of genes can 
enhance the rate of evolution and the acquisition of desirable 
new properties. Recombination to create chimeric or mosaic 
genes can be useful in order to combine desirable features of 
two or more parents into a single gene or set of genes, or to 
create novel functional features not found in the parents. The 
number of different substrates to be combined can vary 
widely in size from two to 10, 100, 1000, to more than 10^, 
10^, or 10^ members. 

The initial small population of the specific nucleic acid 
sequences having mutations may be created by a number of 
different methods. Mutations may be created by error-prone 
PCR. Error-prone PCR uses low-fidelity polymerization 
conditions to introduce a low level of point mutations 
randomly over a long sequence. Alternatively, mutations can 
be introduced into the template polynucleotide by 
oligonucleotide -directed mutagenesis. In oligonucleotide - 
directed mutagenesis, a short sequence of the polynucleotide 
is removed from the polynucleotide using restriction enzyme 
digestion and is replaced with a synthetic polynucleotide in 
which various bases have been altered from the original 
sequence. The polynucleotide sequence can also be altered 
by chemical mutagenesis. Chemical mutagens include, for 
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example, sodium bisulfite, nitrous acid, hydroxylamine, 
hydrazine or formic acid. Other agents which are analogues 
of nucleotide precursors include nitrosoguanidine, 
5-bromouracil, 2-aminopurine, or acridine. Generally, these 
5 agents are added to the PCR reaction in place of the 
nucleotide precursor thereby mutating the sequence. Inter- 
calating agents such as profiavine, acriflavine, quinacrine 
and the like can also be used. Random mutagenesis of the 
polynucleotide sequence can also be achieved by irradiation 
10 with X-rays or ultraviolet light. Generally, plasmid DNA or 
DNA fragments so mutagenized are introduced into E. coli 
and propagated as a pool or library of mutant plasmids. 

Alternatively the small mixed population of specific 
nucleic acids can be found in nature in the form of different 
alleles of the same gene or the same gene from different 
related species (i.e., cognate genes). Alternatively, substrates 
can be related but nonallelic genes, such as the immunoglo- 
bulin genes. Diversity can also be the result of previous 
recombination or shuffling. Diversity can also result from 
20 resynthesizing genes encoding natural proteins with alter- 
native codon usage. 

The starting substrates encode variant forms of sequences 
to be evolved. In some methods, the substrates encode 
variant forms of a protein for which evolution of a new or 
modified property is desired. In other methods, the sub- 
strates can encode variant forms of a plurality of genes 
constituting a multigene pathway. In such methods, variation 
can occur in one or any number of the component genes. In 
other methods, substrates can contain variants segments to 
be evolved as DNA or RNA binding sequences. In methods, 
in which starting substrates containing coding sequences, 
any essential regulatory sequences, such as a promoter and 
polyadenylation sequence, required for expression may also 
be present as a component of the substrate. Alternatively, 
such regulatory sequences can be provided as components of 
vectors used for cloning the substrates. 

The starting substrates can vary in length from about 50, 
250, 1000, 10,000, 100,000, 10^ or more bases. The starting 
substrates can be provided in double- or single -stranded 
form. The starting substrates can be DNA or RNA and 
analogs thereof. If DNA, the starting substrates can be 
genomic or cDNA. If the substrates are RNA, the substrates 
are typically reverse-transcribed to cDNA before heterodu- 
plex formation. Substrates can be provided as cloned 
fragments, chemically synthesized fragments or PCR ampli- 
fication products. Substrates can derive from chromosomal, 
plasmid or viral sources. In some methods, substrates are 
provided in concatemeric form. 

III. Procedures for Generating Heteroduplexes 

Heteroduplexes are generated from double stranded DNA 
substrates, by denaturing the DNA substrates and incubating 
under annealing conditions. Hybridization conditions for 
55 heteroduplex formation are sequence -dependent and are 
different in different circumstances. Longer sequences 
hybridize specifically at higher temperatures. Generally, 
hybridization conditions are selected to be about 25° C. 
lower than the thermal melting point (Tm) for the specific 
60 sequence at a defined ionic strength and pH. The Tm is the 
temperature (under defined ionic strength, pH, and nucleic 
acid concentration) at which 50% of the probes complemen- 
tary to the target sequence hybridize to the target sequence 
at equilibrium. 

65 Exemplary conditions for denaturation and renaturation 
of double stranded substrates are as follows. Equimolar 
concentrations (~1. 0-5.0 nM) of the substrates are mixed in 
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IxSSPE buffer (180 mM NaCl, 1.0 mM EDTA, 10 mM 
NaH 2 P 04 , pH 7.4) After heating at 96° C. for 10 minutes, 
the reaction mixture is immediately cooled at 0° C. for 5 
minutes; The mixture is then incubated at 68 ° C. for 2-6 hr. 
Denaturation and reannealing can also be carried out by the 5 
addition and removal of a denaturant such as NaOH. The 
process is the same for single stranded DNA substrates, 
except that the denaturing step may be omitted for short 
sequences. 

By appropriate design of substrates for heteroduplex 
formation, it is possible to achieve selection for heterodu- 
plexes relative to reformed parental homoduplexes. Homo- 
duplexes merely reconstruct parental substrates and effec- 
tively dilute recombinant products in subsequent screening 
steps. In general, selection is achieved by designing sub- 
strates such that heteroduplexes are formed in open-circles, 
whereas homoduplexes are formed as linear molecules. A 
subsequent transformation step results in substantial enrich- 
ment (e.g., 100 -fold) for the circular heteroduplexes. 

FIG. 1 shows a method in which two substrate sequences 20 
in separate vectors are PCR-amplified using two different 
sets of primers (PI, P2 and P3, P4). Typically, first and 
second substrates are inserted into separate copies of the 
same vector. The two different pairs of primers initiate 
amplification at different points on the two vectors. FIG. 1 25 
shows an arrangement in which the P1/P2 primer pairs 
initiates amplification at one of the two boundaries of the 
vector with the substrate and the P1/P2 primer pair initiates 
replication at the other boundary in a second vector. The two 
primers in each primer pair prime amplification in opposite 30 
directions around a circular plasmid. The amplification 
products generated by this amplification are double-stranded 
linearized vector molecules in which the first and second 
substrates occur at opposite ends of the vector. The ampli- 
fication products are mixed, denatured and annealed. Mixing 35 
and denaturation can be performed in either order. Re an- 
nealing generates two linear homoduplexes, and an open 
circular heteroduplex containing one nick in each strand, at 
the initiation point of PCR amplification. Introduction of the 
amplification products into host cells selects for the hetero- 40 
duplexes relative to the homoduplexes because the former 
transform much more efficiently than the latter. 

It is not essential in the above scheme that amplification 
is initiated at the interface between substrate and the rest of 
the vector. Rather, amplification can be initiated at any 45 
points on two vectors bearing substrates provided that the 
amplification is initiated at different points between the 
vectors. In the general case, such amplification generates 
two linearized vectors in which the first and second sub- 
strates respectively occupy different positions relative to the 50 
remainder of the vector. Denaturation and re annealing gen- 
erator heteroduplexes similar to that shown in FIG. 1, except 
that the nicks occur within the vector component rather than 
at the interface between plasmid and substrate. Initiation of 
amplification outside the substrate component of a vector 55 
has the advantage that it is not necessary to design primers 
specific for the substrate borne by the vector. 

Although FIG. 1 is exemplified for two substrates, the 
above scheme can be extended to any number of substrates. 

For example, an initial population of vector bearing sub- 60 
strates can be divided into two pools. One pool is PCR- 
amplified from one set of primers, and the other pool from 
another. The amplification products are denatured and 
annealed as before. Heteroduplexes can form containing one 
strand from any substrate in the first pool and one strand 65 
from any substrate in the second pool. Alternatively, three or 
more substrates cloned into multiple copies of a vector can 


10 

be subjected to amplification with amplification in each 
vector starting at a different point. For each substrate, this 
process generates amplification products varying in how 
banking vector DNA is divided on the two sides of the 
substrate. For example, one amplification product might 
have most of the vector on one side of the substrate, another 
amplification product might have most of the vector on the 
other side of the substrate, and a further amplification 
product might have an equal division of vector sequence 
banking the substrate. In the subsequent annealing step, a 
strand of substrate can form a circular heteroduplex with a 
strand of any other substrate, but strands of the same 
substrate can only re anneal with each other to form a linear 
homoduplex. In a still further variation, multiple substrates 
can be performed by performing multiple iterations of the 
scheme in FIG. 1. After the first iteration, recombinant 
polynucleotides in a vector, undergo heteroduplex formation 
with a third substrate incorporated into a further copy of the 
vector. The vector bearing the recombinant polynucleotides 
and the vector bearing the third substrate are separately PCR 
amplified from different primer pairs. The amplification 
products are then denatured and annealed. The process can 
be repeated further times to allow recombination with fur- 
ther substrates. 

An alternative scheme for heteroduplex formation is 
shown in FIG. 2. Here, first and second substrates are 
incorporated into separate copies of a vector. The two copies 
are then respectively digested with different restriction 
enzymes. FIG. 2 shows an arrangement in which, the 
restriction enzymes cut at opposite boundaries between 
substrates and vector, but all that is necessary is to use two 
different restriction enzymes that cut at different places. 
Digestion generates linearized first and second vector bear- 
ing first and second substrates, the first and second substrates 
occupying different positions relative to the remaining vec- 
tor sequences. Denaturation and reannealing generates open 
circular heteroduplexes and linear homoduplexes. The 
scheme can be extended to recombination between more 
than two substrates using analogous strategies to those 
described with respect to FIG. 1. In one variation, two pools 
of substrates are formed, and each is separately cloned into 
vector. The two pools are then cute with different enzymes, 
and annealing proceeds as for two substrates. In another 
variation, three or more substrates can be cloned into three 
or more copies of vector, and the three or more result 
molecules cut with three or more enzymes, cutting at three 
or more sites. This generates three different linearized vector 
forms differing in the division of vector sequences flanking 
the substrate moiety in the vectors. Alternatively, any num- 
ber of substrates can be recombined pairwise in an iterative 
fashion with products of one round of recombination anneal- 
ing with a fresh substrate in each round. 

In a further variation, heteroduplexes can be formed from 
substrates molecules in vector-free form, and the heterodu- 
plexes subsequently cloned into vectors. Such can be 
achieved by asymmetric amplification of first and second 
substrates as shown in FIG. 3. Asymmetric or single primer 
PCR amplifies only one strand of a duplex. By appropriate 
selection of primers, opposite strands can be amplified from 
two different substrates. On reannealing amplification 
products, heteroduplexes are formed from opposite strands 
of the two substrates. Because only one strand is amplified 
from each substrate, re annealing does not reform homodu- 
plexes (other than for small quantities of unamplified 
substrate). The process can be extended to allow recombi- 
nation of any number of substrates using analogous strate- 
gies to those described with respect to FIG. 1. For example. 
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substrates can be divided into two pools, and each pool 
subject to the same asymmetric amplification, such that 
amplification products of one pool can only anneal with 
amplification products of the other pool, and not with each 
other. Alternatively, shuffling can proceed pairwise in an 
iterative manner, in which recombinants formed from het- 
eroduplexes of first and second substrates, are subsequently 
subjected to heteroduplex formation with a third substrate. 
Point mutations can also be introduced at a desired level 
during PCR amplification. 

FIG. 4 shows another approach of selecting for hetero- 
duplexes relative to homoduplexes. First and second sub- 
strates areiisolated by PCR amplification from separate 
vectors. The substrates are denatured and allowed to anneal 
forming both heteroduplexes and reconstructed homodu- 
plexes. The products of annealing are digested with restric- 
tion enzymes X and Y. X has a site in the first substrate but 
not the second substrate, and vice versa for Y. Enzyme X 
cuts reconstructed homoduplex from the first substrate and 
enzyme Y cuts reconstructed homoduplex from the second 
substrate. Neither enzyme cuts heteroduplexes. Heterodu- 
plexes can effectively be separated from restriction frag- 
ments of homoduplexes by further cleavage with enzymes A 
and B having sites proximate to the ends of both the first and 
second substrates, and ligation of the products into vector 
having cohesive ends compatible with ends resulting from 
digestion with A and B. Only heteroduplexes cut with A and 
B can ligate with the vector. Alternatively, heteroduplexes 
can be separated from restriction fragments of homodu- 
plexes by size selection on gels. The above process can be 
generalized to N substrates by cleaving the mixture of 
heteroduplexes and homoduplexes with N enzymes, each 
one of which cuts a different substrate and no other sub- 
strate. Heteroduplexes can be formed by directional cloning. 
Two substrates for heteroduplex formation can be obtained 
by PCR amplification of chromosomal DNA and joined to 
opposite ends of a linear vector. Directional cloning can be 
achieved by digesting the vector with two different enzymes, 
and digesting or adapting first and second substrates to be 
respectively compatible with cohesive ends of only of the 
two enzymes used to cut the vector. The first and second 
substrates can thus be ligated at opposite ends of a linearized 
vector fragment. This scheme can be extended to any 
number of substrates by using principles analogous to those 
described for FIG. 1 . For example, substrates can be divided 
into two pools before ligation to the vector. Alternatively, 
recombinant products formed by heteroduplex formation of 
first and second substrates, can subsequently undergo het- 
eroduplex formation with a third substrate. 

IV. Vectors and Transformation 

In general, substrates are incorporated into vectors either 
before or after the heteroduplex formation step. A variety of 
cloning vectors typically used in genetic engineering are 
suitable. 

The vectors containing the DNA segments of interest can 
be transferred into the host cell by standard methods, 
depending on the type of cellular host. For example, calcium 
chloride transformation is commonly utilized for prokary- 
otic cells, whereas calcium phosphate treatment. 
Lipofection, or electroporation may be used for other cel- 
lular hosts. Other methods used to transform mammalian 
cells include the use of Polybrene, protoplast fusion, 
liposomes, electroporation, and microinjection, and 
biolisitics (see, generally, Sambrook et al., supra). Viral 
vectors can also be packaged in vitro and introduced by 
infection. The choice of vector depends on the host cells. In 
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general, a suitable vector has an origin of replication rec- 
ognized in the desired host cell, a selection maker capable of 
being expressed in the intended host cells and/or regulatory 
sequences to support expression of genes within substrates 
5 being shuffled. 

V. Types of Host Cells 

In general any type of cells supporting DNA repair and 
replication of heteroduplexes introduced into the cells can be 
10 used. Cells of particular interest are the standard cell types 
commonly used in genetic engineering, such as bacteria, 
particularly, E. coli (16, 17). Suitable E. coli strains include 
E. coli mutS, mutL, dam”, and/or recA"^, E.coli XL-lO-Gold 
([TeCA(mcrA)183 A(mcrCB-hsdSMR-mrr)173 endAl 
15 supE44 thi-1 recAl gyrA96 relAl lac Hte] [F'proAB 
lacHZAMlS TnlO (TeF) Amy Cam"']), £. co//ES1301 mutS 
[Genotype: lacZ53, mutS201:Tn5, thyA36, rha-5, metBl, 
deoC, IN(rrnD-rrnE)] (20, 24, 28-42). Preferred E. coli 
strains are E. coli SCS 110 [Genotype: rpsl, (Str"), thr, leu, 
20 enda, thi-1, lacy, galk, gait, ara tona, tsx, dam, dcm, supE44, 
A(lac-proAB), [F, traD36, proA'^BHacHZAMlS], which 
have normal cellular mismatch repair systems (17). This 
strain type repairs mismatches and unmatches in the hetero- 
duplex with little strand-specific preference. Further, 
25 because this strain is dam” and dcm”, plasmid isolated from 
the strain is unmethylated and therefore particularly ame- 
nable for further rounds of DNA duplex formation/mismatch 
repair (see below). Other suitable bacterial cells include 
gram-negative and gram-positive, such as Bacillus, 
30 Pseudomonas, and Salmonella. 

Eukaryotic organisms are also able to carry out mismatch 
repair (43-48). Mismatch repair systems in both prokaryotes 
and eukaryotes are thought to play an important role in the 
maintenance of genetic fidelity during DNA replication, 
35 Some of the genes that play important roles in mismatch 
repair in prokaryotes, particularly mutS and mutL, have 
homologs in eukaryotes, in the outcome of genetic 
recombinations, and in genome stability. Wild-type or 
mutant S. cerevisiae has been shown to carry out mismatch 
40 repair of heteroduplexes (49-56), as have COS-1 monkey 
cells (57). Preferred strains of yeast are Picchia and Sac- 
charomyces. Mammalian cells have been shown to have the 
capacity to repair G-T to G-C base pairs by a short-patch 
mechanism (38, 58-63). Mammalian cells (e.g., mouse, 
45 hamster, primate, human), both cell lines and primary cul- 
tures can also be used. Such cells include stem cells, 
including embryonic stem cells, zygotes, fibroblasts, 
lymphocytes, Chinese hamster ovary (CHO), mouse fibro- 
blasts (NIH3T3), kidney, liver, muscle, and skin cells. Other 
50 eucaryotic cells of interest include plant cells, such as maize, 
rice, wheat, cotton, soybean, sugarcane, tobacco, and arabi- 
dopsis; fish, algae, fungi (aspergillus, podospora, 
neurospora), insect (e.g., baculo lepidoptera) (see, 
Winnacker, “From Genes to Clones,” VCH Publishers, New 
55 York, (1987), which is incorporated herein by reference). 

In vivo repair occurs in a wide variety of prokaryotic and 
eukaryotic cells. Use of mammalian cells is advantage in 
certain application in which substrates encode polypeptides 
that are expressed only in mammalian cells or which are 
intended for use in mammalian cells. However, bacterial and 
yeast cells are advantageous for screening large libraries due 
to the higher transformation frequencies attainable in these 
strains. 

V. In Vitro DNA Repair Systems 

As an alternative to introducing annealed products into 
host cells, annealed products can be exposed a DNA repair 
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system in vitro. The DNA repair system can be obtained as 
extracts from repair-competent E. coli, yeast or any other 
cells (64-67). Repair-competent cells are lysed in appropri- 
ate buffer and supplemented with nucleotides. DNA is 
incubated in this cell extract and transformed into competent 
cells for replication. 

VI. Screening and Selection 

After introduction of annealed products into host cells, the 
host cells are typically cultured to allow repair and replica- 
tion to occur and optionally, for genes encoded by poly- 
nucleotides to be expressed. The recombinant polynucle- 
otides can be subject to further rounds of recombination 
using the heteroduplex procedures described above, or other 
shuffling methods described below. However, whether after 
one cycle of recombination or several, recombinant poly- 
nucleotides are subjected to screening or selection for a 
desired property. In some instances, screening or selection in 
performed in the same host cells that are used for DNA 
repair. In other instances, recombinant polynucleotides, their 
expression products or secondary metabolites produced by 
the expression products are isolated from such cells and 
screened in vitro. In other instances, recombinant polynucle- 
otides are isolated from the host cells in which recombina- 
tion occurs and are screened or selected in other host cells. 
For example, in some methods, it is advantageous to allow 
DNA repair to occur in a bacterial host strain, but to screen 
an expression product of recombinant polynucleotides in 
eucaryotic cells. The recombinant polynucleotides surviving 
screening or selection are sometimes useful products in 
themselves. In other instances, such recombinant polynucle- 
otides are subjected to further recombination with each other 
or other substrates. Such recombination can be effected by 
the heteroduplex methods described above or any other 
shuffling methods. Further round(s) of recombination are 
followed by further rounds of screening or selection on an 
iterative basis. Optionally, the stringency of selection can be 
increased at each round. 

The nature of screening or selection depends on the 
desired property sought to be acquired. Desirable properties 
of enzymes include high catalytic activity, capacity to confer 
resistance to drugs, high stability, the ability to accept a 
wider (or narrower) range of substrates, or the ability to 
function in nonnatural environments such as organic sol- 
vents. Other desirable properties of proteins include capacity 
to bind a selected target, secretion capacity, capacity to 
generate an immune response to a given target, lack of 
immunogenicity and toxicity to pathogenic microorganisms. 
Desirable properties of DNA or RNA polynucleotides 
sequences include capacity to specifically bind a given 
protein target, and capacity to regulate expression of oper- 
ably linked coding sequences. Some of the above properties, 
such as drug resistance, can be selected by plating cells on 
the drug. Other properties, such as the infiuence of a 
regulatory sequence on expression, can be screened by 
detecting appearance of the expression product of a reporter 
gene linked to the regulatory sequence. Other properties, 
such as capacity of an expressed protein to be secreted, can 
be screened by FACS™, using a labelled antibody to the 
protein. Other properties, such as immunogenicity or lack 
thereof, can be screened by isolating protein from individual 
cells or pools of cells, and analyzing the protein in vitro or 
in a laboratory animal. 

VII. Variations 

1. Demethylation 

Most cell types methylate DNA in some manner, with the 
pattern of methylation differing between cells types. Sites of 
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methylation include 5-methylcytosine (m^C), 
N4-methylcytosine (m"^C) and N^-methyladenine (m^A), 
5 -hydroxymethylcy tosine (hm^C) and 
5-hydroxymethyluracil (hm^U). In E. coli, methylation is 
5 effected by Dam and Dcm enzymes. The methylase specified 
by the dam gene methylates the N6-position of the adenine 
residue in the sequence GATC, and the methylase specified 
by the dcm gene methylates the C5 -position of the internal 
cytosine residue in the sequence CCWGG. DNA from plants 
10 and mammal is often subject to CG methylation meaning 
that CG or CNG sequences are methylated. Possible effects 
of methylated on cellular repair are discussed by references 
18-20. 

In some methods, DNA substrates for heteroduplex for- 
15 mation are at least partially demethylated on one or both 
strands, preferably the latter. Demethylation of substrate 
DNA promotes efficient and random repair of the heterodu- 
plexes. In heteroduplexes formed with one strand dam- 
methylated and one strand unmethylated, repair is biased to 
20 the unmethylated strand, with the methylated strand serving 
as the template for correction. If neither strand is 
methylated, mismatch repair occurrs, but showes insignifi- 
cant strand preference (23, 24). 

Demethylation can be performed in a variety of ways. In 
25 some methods, substrate DNA is demethylated by PCR- 
amplification. In some instances, DNA demethylation is 
accomplished in one of the PCR steps in the heteroduplex 
formation procedures described above. In other methods, an 
additional PCR step is performed to effect demethylation. In 
30 other methods, demethylation is effected by passaging sub- 
strate DNA through methylation deficient host cells (e.g. an 
E. coli dam“dcm“ strain). In other methods, substrate DNA 
is demethylated in vitro using a demethylating enzymes. 
Demethylated DNA is used for heteroduplex formation 
35 using the same procedures described above. Heteroduplexes 
are subsequently introduced into DNA-repair-proficient but 
restriction-enzyme-defective cells to prevent degradation of 
the unmethylated heteroduplexes. 

2. Sealing Nicks 

40 Several of the methods for heteroduplex formation 
described above result in circular heteroduplexes bearing 
nicks in each strand. These nicks can be sealed before 
introducing heteroduplexes into host cells. Sealing can be 
effected by treatment with DNA ligase under standard 
45 ligating conditions. Ligation forms a phosphodiester bond to 
link two adjacent bases separated by a nick in one strand of 
double helix of DNA. Sealing of nicks increases the fre- 
quency of recombination after introduction of heterodu- 
plexes into host cells. 

50 3. Error Prone PCR Attendant To Amplification 

Several of the formats described above include a PCR 
amplification step. Optionally, such a step can be performed 
under mutagenic conditions to induce additional diversity 
between substrates. 

VIII. Other Shuffling Methods 

The methods of heteroduplex formation described above 
can be used in conjunction with other shuffling methods. For 
example, one can perform one cycle of heteroduplex 
60 shuffiing, screening or selection, followed by a cycle of 
shuffiing by another method, followed by a further cycle of 
screening or selection. Other shuffling formats are described 
by WO 95/22625; U.S. Pat. No. 5,605,793; U.S. Pat. No. 
5,811,238; WO 96/19256; Stemmer, Science 270, 1510 
65 (1995); Stemmer et al.. Gene, 164, 49-53 (1995); Stemmer, 
Bio /Technology, 13, 549-553 (1995); Stemmer, Proc. Natl 
Acad. Sci. USA 91, 10747-10751 (1994); Nature 
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370, 389-391 (1994); Crameri et al., Nature Medicine, 
2(1): 1-3, (1996); Crameri et al.. Nature Biotechnology 14, 
315-319 (1996); WO 98/42727; WO 98/41622; WO 
98/05764 and WO 98/42728, WO 98/27230 (each of which 
is incorporated by reference in its entirety for all purposes). 5 

IX. Protein Analogs 

Proteins isolated by the methods also serve as lead 
compounds for the development of derivative compounds. 
The derivative compounds can include chemical modifica- 
tions of amino acids or replace amino acids with chemical 
structures. The analogs should have a stabilized electronic 
configuration and molecular conformation that allows key 
functional groups to be presented in substantially the same 
way as a lead protein. In particular, the non-peptic com- 
pounds have spatial electronic properties which are compa- 
rable to the polypeptide binding region, but will typically be 
much smaller molecules than the polypeptides, frequently 
having a molecular weight below about 2 CHD and prefer- 
ably below about 1 CHD. Identification of such non-peptic 
compounds can be performed through several standard 
methods such as self-consistent field (CSF) analysis, con- 
figuration interaction (CHI) analysis, and normal mode 
dynamics analysis. Computer programs for implementing 
these techniques are readily available. See Rein et al., 
Computer-Assisted Modeling of Receptor-Ligand Interac- 
tions (Alan Liss, New York, 1989). 

IX. Pharmaceutical Compositions 

30 

Polynucleotides, their expression products, and secondary 
metabolites whose formation is catalyzed by expression 
products, generated by the above methods are optionally 
formulated as pharmaceutical compositions. Such a compo- 
sition comprises one or more active agents, and a pharma- 35 
ceutically acceptable carrier. A variety of aqueous carriers 
can be used, e.g., water, buffered water, phosphate-buffered 
saline (PBS), 0.4% saline, 0.3% glycine, human albumin 
solution and the like. These solutions are sterile and gener- 
ally free of particulate matter. The compositions may contain 40 
pharmaceutically acceptable auxiliary substances as 
required to approximate physiological conditions such as pH 
adjusting and buffering agents, toxicity adjusting agents and 
the like, for example, sodium acetate, sodium chloride, 
potassium chloride, calcium chloride and sodium is selected 45 
primarily based on fluid volumes, viscosities, and so forth, 
in accordance with the particular mode of administration 
selected. 

EXAMPLES 50 

Example 1 

Novel Rhizobium Flaa Genes from Recombination 
of Rhizobium Lupini Flaa And Rhizobium Meliloti 
FlaA 

Bacterial flagella have a helical filament, a proximal hook 
and a basal body with the flagellar motor (68). This basic 
design has been extensively examined in E. coli and 5. 
typhimurium and is broadly applicable to many other bac- 60 
teria as well as some archaea. The long helical filaments are 
polymers assembled from flagellin subunits, whose molecu- 
lar weights range between 20,000 and 65,000, depending on 
the bacterial species (69). Two types of flagellar filaments, 
named plain and complex, have been distinguished by their 65 
electron microscopically determined surface structures (70). 
Plain filaments have a smooth surface with faint helical 
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lines, whereas complex filaments exhibit a conspicuous 
helical pattern of alternating ridges and grooves. These 
characteristics of complex flagellar filaments are considered 
to be responsible for the brittle and (by implication) rigid 
structure that enables them to propel bacteria efficiently in 
viscous media (71-73). Whereas flagella with plain fila- 
ments can alternate between clockwise and counter clock- 
wise rotation (68), all known flagella with complex fila- 
ments rotate only clockwise with intermittent stops (74). 
Since this latter navigation pattern is found throughout 
bacteria and archaea, it has been suggested that complex 
flagella may reflect the common background of an ancient, 
basic motility design (69). 

Differing from plain bacterial flagella in the fine structure 
of their filaments dominated by conspicuous helical bands 
and in their fragility, the filaments are also resistant against 
heat decomposition (72). Schmitt et al. (75) showed that 
bacteriophage 7-7-1 specifically adsorbs to the complex 
flagella of R. lupini H 13-3 and requires motility for a 
productive infection of its host. Though the flagellins from 
R. meliloti and R. lupini are quite similar, bacteriophage 
7-7-1 does not mitcX R.meliloti. Until now complex flagella 
have been observed in only three species of soil bacteria: 
Pseudomonas rhodos (73), R.meliloti (76), and R. lupini 
H13-3 (70, 72). Cells of R.lupini H13-3 posses 5 to 10 
peritrichously inserted complex flagella, which were first 
isolated and analyzed by high resolution electron micros- 
copy and by optical diffraction (70). 

Maruyama et al. (77) further found that a higher content 
of hydrophobic amino acid residues in the complex filament 
may be one of the main reasons for the unusual properties of 
complex flagella. By measuring mass per unit length and 
obtaining three-dimensional reconstruction from electron 
micrographs, Trachtenberg et al. (73, 78) suggested that the 
complex filaments of R. lupini are composed of functional 
dimers. FIG. 6 shows the comparison between the deduced 
amino acid sequence of the R. lupini HI 3-3 FlaA and the 
deduced amino acid sequence of the R. meliloti FlaA. 
Perfect matches are indicated by vertical lines, and conser- 
vative exchanges are indicated by colons. The overall iden- 
tity is 56%. The R.lupini flaA and R.meliloti flaA were 
subjected to in vitro heteroduplex formation followed by in 
vivo repair in order to create novel Fla A molecules and 
structures. 

A. Methods 

pRL20 containing R. lupini H-13-3 flaAgene and pRM40 
containing R.meliloti flaA gene are shown in FIGS. 6A and 
6B. These plasmids were isolated from£. coli SCSllO (free 
from dam- and dcm-type methylation). About 3.0 pg. of 
unmethylated pRL20 and pRM40 DNA were digested with 
Bam HI and Eco RI, respectively, at 37° C. for 1 hour. After 
agarose gel separation, the linearized DNA was purified with 
Wizard PCR Prep kit (Promega, Wis., USA). Equimolar 
concentrations (2.5 nM) of the linearized unmethylated 
pRL20 and pRM40 were mixed in IxSSPE buffer (180 mM 
NaCl, 1 mM EDTA, 10 mM NaH2P04, pH 7.4). After 
heating at 96° C. for 10 minutes, the reaction mixture was 
immediately cooled at 0° C. for 5 minutes. The mixture was 
incubated at 68° C. for 2 hour for heteroduplexes to form. 

One microliter of the reaction mixture was used to trans- 
form SOjjQtE. coli ES 1301 mutS,E. coli SCSllO and£. coli 
JM109 competent cells. The transformation efficiency with 
E. coli JM109 competent cells was about seven times higher 
than that of £. coli SCSllO and ten times higher than that of 
E. coli ES1301 mutS, although the overall transformation 
efficiencies were 10-200 times lower than those of control 
transformations with the close, covalent and circular pUC19 
plasmid. 
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Two clones were selected at random from the E. coli 
SCSllO transformants and two from E. coli ES1301 mutS 
transformants, and plasmid DNA was isolated from these 
four clones for further DNA sequencing analysis. 

B. Results 

FIG. 7 shows (a) the sequence of SCSOl (clone#l from£. 
coli SCSllO transformant library), (b) the sequence of 
SCS02 (clone #2 from£. coli SCSllO transformant library), 
(c) the sequence of ESOl (clone #1 from E. coli ESI 301 
transformant library), and (d) the sequence of ES02 (clone 
#2 from E. coli ES1301 transformant library). All four 
sequences were different from wild-type R. lupini flaA and 
R. meliloti flaA sequences. Clones SCS02, ESOl and ES02 
all contain a complete open-reading frame, but SCSOl was 
truncated. FIG. 8 shows that recombination mainly occurred 
in the loop regions (unmatched regions). The flaA mutant 
library generated fromR. meliloti flaA and R. lupini flaA can 
be transformed into E. coli SCSllO, ESI 301, XLIO-Gold 
and JM109, and transformants screened for functional FlaA 
recombinants. 

Example 2 

Directed Evolution of ECB Deacylase for Variants 
with Enhanced Speciflc Activity 

Streptomyces are among the most important industrial 
microorganisms due to their ability to produce numerous 
important secondary metabolites (including many 
antibiotics) as well as large amounts of enzymes. The 
approach described here can be used with little modiflcation 
for directed evolution of native Streptomyces enzymes, 
some or all of the genes in a metabolic pathways, as well as 
other heterologous enzymes expressed in Streptomyces. 

New antifungal agents are critically needed by the large 
and growing numbers of immune -compromised AIDS, 
organ transplant and cancer chemotherapy patients who 
suffer opportunistic infections. Echinocandin B (ECB), a 
lipopeptide produced by some species of Aspergillus, has 
been studied extensively as a potential antifungal. Various 
antifungal agents with signiflcantly reduced toxicity have 
been generated by replacing the linoleic acid side chain of A. 
nidulans echinocandin B with different aryl side chains 
(79-83). The cyclic hexapeptide ECB nucleus precursor for 
the chemical acylation is obtained by enzymatic hydrolysis 
of ECB using Actinoplanes utahensis ECB deacylase. To 
maximize the conversion of ECB into intact nucleus, this 
reaction is carried out at pH 5.5 with a small amount of 
miscible organic solvent to solubilize the ECB substrate. 
The product cyclic hexapeptide nucleus is unstable at pH 
above 5.5 during the long incubation required to fully 
deacylate ECB (84). The pH optimum of ECB deacylase, 
however, is 8. 0-8. 5 and its activity is reduced at pH 5.5 and 
in the presence of more than 2.5% ethanol (84). To improve 
production of ECB nucleus it is necessary to increase the 
activity of the ECB deacylase under these process-relevant 
conditions. 

Relatively little is known about ECB deacylase. The 
enzyme is a heterodimer whose two subunits are derived by 
processing of a single precursor protein (83). The 19.9 kD 
a-subunit is separated from the 60.4 kD p -subunit by a 
15-amino acid spacer peptide that is removed along with a 
signal peptide and another spacer peptide in the native 
organism. The polypeptide is also expressed and processed 
into functional enzyme in Streptomyces lividans, the organ- 
ism used for large-scale conversion of ECB by recombinant 
ECB deacylase. The three-dimensional structure of the 
enzyme has not been determined, and its sequence shows so 
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little similarity to other possibly related enzymes such as 
penicillin acylase that a structural model reliable enough to 
guide a rational effort to engineer the ECB deacylase will be 
difficult to build. We therefore decided to use directed 
5 evolution (85) to improve this important activity. 

Protocols suitable for mutagenic PCR and random- 
priming recombination of the 2.4 kb ECB deacylase gene 
(73% G+C) have been described recently (86). Here, we 
further describe the use of heteroduplex recombination to 
10 generate new ECB deacylase with enhanced specific activ- 
ity. 

In this case, two Actinoplanes utahensis ECB deacylase 
mutants, M7-2 and Ml 6, which show higher specific activity 
at pH 5.5 and in the presence of 10% Me OH were recom- 
15 bined using technique of the in vitro heteroduplex formation 
and in vivo mismatch repair. 

FIG. 12 shows the physical maps of plasmids pM7-2 and 
pM16 which contain the genes for the M7-2 and Ml 6 ECB 
deacylase mutants. Mutant M7-2 was obtained through 
20 mutagenic PCR performed directly on whole Streptomyces 
lividans cells containing wild-type ECB deacylase gene, 
expressed from plasmid pSHP150-2*. Streptomyces with 
pM7-2 show 1.5 times the specific activity of cells express- 
ing the wild-type ECB deacylase (86). Clone pM16 was 
25 obtained using the random-priming recombination tech- 
nique as described (86, 87). It shows 2.4 times specific 
activity of the wild-type ECB deacylase clone. 

A. Methods: 

M7-2 and M16 plasmid DNA(pM7-2 and pM16) (FIG. 9 ) 
30 were purified from E. coli SCS210 (in separate reactions). 
About 5.0 //g of unmethylated M7-2 and Ml 6 DNA were 
digested with Xho I and Psh AI, respectively, at 37° C. for 
1 hour (FIG. 10 ). After agarose gel separation, the linearized 
DNA was purified using a Wizard PCR Prep Kit (Promega, 
35 Wis., USA). Equimolar concentrations (2.0 nM) of the 
linearized unmethylated pM7-2 and pM16 DNA were mixed 
in IxSSPE buffer (IxSSPE: 180 mM NaCl, 1.0 mM EDTA, 
10 mM NaH 2 P 04 , pH 7.4). After heating at 96° C. for 10 
minutes, the reaction mixture is immediately cooled at 0° C. 
40 for 5 minutes. The mixture was incubated at 68° C. for 3 
hours to promote formation of heteroduplexes. 

One microliter of the reaction mixture was used to trans- 
form 50 jul of E.coli ES1301 mutS, SCSllO and JM109 
competent cells. All transformants from E. coli ES1301 
45 mutS were pooled and E. coli SCSllO were pooled. A 
plasmid pool was isolated from each pooled library, and this 
pool was used to transform S. lividans TK23 protoplasts to 
form a mutant library for deacylase activity screening. 
Transformants from the S. lividans TK23 libraries were 
50 screened for ECB deacylase activity with an in situ plate 
assay. Transformed protoplasts were allowed to regenerate 
on R2YE agar plates for 24 hr at 30° C. and to develop in 
the presence of thiostrepton for 48 hours. When the colonies 
grew to the proper size, 6 ml of 0.7% agarose solution 
55 containing 0.5 mg/ml ECB in 0.1 M sodium acetate buffer 
(pH 5.5) was poured on top of each R2YE-agar plate and 
allowed to develop for 18-24 hr at 30° C. Colonies sur- 
rounded by a clearing zone larger than that of a control 
colony containing wild-type plasmid pSHP150-2*, were 
60 selected for further characterization. 

Selected transformants were inoculated into 20 ml 
medium containing thiostrepton and grown aerobically at 
30° C. for 48 hours, at which point they were analyzed for 
ECB deacylase activity using HPLC. 100 /il of whole broth 
65 was used for a reaction at 30° C. for 30 minutes in 0.1 M 
NaAc buffer (pH 5.5) containing 10% (v/v) MeOH and 200 
//g/ml of ECB substrate. The reactions were stopped by 
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adding 2.5 volumes of methanol, and 20 of each sample 
were analyzed by HPLC on a 100x4.6 mm polyhydroxy- 
ethyl aspartamide column (PolyLC Inc., Columbia, Md., 
USA) at room temperature using a linear acetonitrile gradi- 
ent starting with 50:50 of A:B (A=93% acetonitrile, 0.1% 
phosphoric acid; B=70% acetonitrile, 0.1% phosphoric acid) 
and ending with 30:70 of A:B in 22 min at a flow rate of 2.2 
ml/min. The areas of the ECB and ECB nucleus peaks were 
calculated and subtracted from the areas of the correspond- 
ing peaks from a sample culture of S. lividans containing 
pIJ702* in order to estimate the ECB deacylase activity. 

2.0 ml pre-cultures of positive mutants were used to 
inoculate 50-ml medium and allowed to grow at 30° C. for 
96 hr. The supernatants were further concentrated to 1/30 
their original volume using an Amicon filtration unit 
(Beverly, Mass., USA) with molecular weight cutoff of 10 
kD. The resulting enzyme samples were diluted with an 
equal volume of 50 mM KH2PO4 (pH 6.0) buffer and were 
applied to Hi-Trap ion exchange column (Pharmacia 
Biotech, Piscataway, N.J., USA). The binding buffer was 50 
mM KH2PO4 (pH 6.0), and the elution buffer was 50 mM 
KH2PO4 (pH 6.0) containing 1.0 M NaCl. A linear gradient 
from 0 to 1.0 M NaCl was applied in 8 column volumes with 
a flow rate of 2.7 ml/min. The ECB deacylase fraction 
eluting at 0.3 M NaCl was concentrated and the buffer was 
exchanged for 50 MM KH2PO4 (pH 6.0) using Centricon-10 
units. Enzyme purity was verified by SDS-PAGE using 
Coomassie Blue stain, and the concentration was determined 
using the Bio-Rad Protein Assay Reagent (Hercules, Calif., 
USA). 

A modified HPLC assay was used to determine the 
activities of the ECB deacylase mutants on ECB substrate 
(84). Four of each purified ECB deacylase mutant was 
used for activity assay reaction at 30° C. for 30 minutes in 
0.1 M NaAc buffer (pH 5.5) containing 10% (v/v) MeOH 
and different concentrations of ECB substrate. Assays were 
performed in duplicate. The reactions were stopped by 
adding 2.5 volumes of methanol, and the HPLC assays were 
carried out as described above. The absorbance values were 
recorded, and the initial rates were calculated by least- 
squares regression of the time progress curves from which 
the Km and the kcat were calculated. 

Activities as a function of pH were measured for the 
purified ECB deacylases at 30° C. at different pH values: 5, 
5.5 and 6 (0.1 M acetate buffer); 7, 7.5, 8 and 8.5 (0.1 M 
phosphate buffer); 9 and 10 (0.1 M carbonate buffer) using 
the HPLC assay. Stabilities of purified ECB deacylases were 
were determined at 30° C. in 0.1 M NaAc buffer (pH 5.5) 
containing 10% methanol. Samples were withdrawn at dif- 
ferent time intervals, and the residual activity was measured 
in the same buffer with the HPLC assay described above. 
B. Results 

FIG. 11 shows that after one round of applying this 
heteroduplex repair technique on the mutant M7-2 and M16 
genes, one mutant (M15) from about 500 original transfor- 
mants was found to possess 3.1 times the specific activity of 
wild-type. Wild type and evolved M15 ECB deacylases 
were purified and their kinetic parameters for de acylation of 
ECB were determined by HPLC. The evolved deacylases 
M15 has an increased catalytic rate constant, k^^^ by 205%. 
The catalytic efficiency (k^^/K^) of M20 is enhanced by a 
factor of 2.9 over the wild-type enzyme. 

Initial rates of deacylation with the wild type and M15 at 
different pH values from 5 to 10 were determined at 200 
/ig/ml of ECB. The recombined M15 is more active than 
wild type at pH 5-8. Although the pH dependence of the 
enzyme activity in this assay is not strong, there is a definite 
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shift of 1.0-1 .5 units in the optimum to lower pH, as 
compared to wild type. 

The time courses of deactivation of the purified ECB 
deacylase mutant M15 was measured in 0.1 M NaAc (pH 
5 5.5) at 30° C. No significant difference in stability was 
observed between wild type and mutant M15. 

The DNA mutations with respect to the wild type ECB 
deacylase sequence and the positions of the amino acid 
substitutions in the evolved variants M7-2, M16 and M15 
10 are summarized in FIG. 12 . 

The heteroduplex recombination technique can recom- 
bine parent sequences to create novel progeny. Recombina- 
tion of the M7-2 and M16 genes yielded M15, whose 
activity is higher than any of its parents (Fid. 13). Of the six 
15 base substitutions in M15, five (at positions a50, a71, p57, 
pl29 and (3340) were inherited from M7-2, and the other one 
(p30) came from M16. 

This approach provides an alternative to existing methods 
of DNA recombination and is particularly useful in recom- 
20 bining large genes or entire operons. This method can be 
used to create recombinant proteins to improve their prop- 
erties or to study structure -function relationship. 

Example 3 

Novel Thermostable Bacillus Subtilis Subtilisin E 
Variants 

This example demonstrates the use in vitro heteroduplex 
formation followed by in vivo repair for combining 
30 sequence information from two different sequences in order 
to improve the thermostability of Bacillus subtilis subtilisin 
E. 

Genes RCl and RC2 encode thermostable B. sublilis 
subtilisin E variants (88). The mutations at base positions 
35 1107 in RCl and 995 in RC2 (FIG. 14 ), giving rise to amino 
acid substitutions Asn218/Ser (N218S) and Asnl81/Asp 
(N181 ID), lead to improvements in subtilisin E thermosta- 
bility; the remaining mutations, both synonymous and 
nonsynonymous, have no detectable effects on thermosta- 
40 bility. At 65° C., the single variants N181D and N218S have 
approximately 3-fold and 2-fold longer half-lives, 
respectively, than wild subtilisin E, and variants containing 
both mutations have half-lives that are 8-fold longer (88). 
The different half-lives in a population of subtilisin E 
45 variants can therefore be used to estimate the efficiency by 
which sequence information is combined. In particular, 
recombination between these two mutations (in the absence 
of point mutations affecting thermostability) should generate 
a library in which 25% of the population exhibits the 
50 thermos/ability of the double mutant. Similarly, 25% of the 
population should exhibit wild-type like stability, as N181D 
and N218S are eliminated at equal frequency. We used the 
fractions of the recombined population as a diagnostic 
A. Methods 

55 The strategy underlying this example is shown in FIG. 15 . 

Subtilisin E thermostable mutant genes RCl and RC2 
(FIG. 14 ) are 986-bp fragments including 45 nt of subtilisin 
E prosequence, the entire mature sequence and 113 nt after 
the stop codon. The genes were cloned between Bam HI and 
60 Nde I in E. colilB. subtilis shuttle vector pBE3, resulting in 
pBE3-l and pBE3-2, respectively. Plasmid DNA pBE3-l 
and pBE3-2 was isolated from E.coli SCSllO. 

About 5.0 jug of ummethylated pBE3-l and pBE3-2 DNA 
were digested with Bam HI and Nde I, respectively, at 37° 
65 C. for 1 hour. After agarose gel separation, equimolar 
concentrations (2.0 nM) of the linearized unmethylated 
pBE3-l and pBE3-2 were mixed in IxSSPE buffer (180 mM 
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NaCl, 1.0 mM EDTA, 10 mM NaH 2 P 04 , pH 7.4). After 
heating at 96° C. for 10 minutes, the reaction mixture was 
immediately cooled at 0° C. for 5 min. The mixture was 
incubated at 68° C. for 2 hr for heteroduplexes to form. 

One microliter of the reaction mixture was used to trans- 
form 50 jA oiE. coli ES 1301 mutS, E. coli SCSllO and E. 
coli HBlOl competent cells. 

The transformation efficiency with E. coli HBlOl com- 
petent cells was about ten times higher than that of E. coli 
SCSllO and 15 times higher than that of E. coli ES1301 
mutS. But in all these cases, the transformation efficiencies 
were 10-250 times lower than that of the transformation 
with closed, covalent and circular control pUC19 plasmids. 

Five clones from E. coli SCSllO mutant library and five 
from E. coli ESI 301 mutS library were randomly chosen, 
and plasmid DNA was isolated using a QIAprep spin 
plasmid miniprep kit for further DNA sequencing analysis. 

About 2,000 random clones from E. coli HBlOl mutant 
library were pooled and total plasmid DNA was isolated 
using a QIAGEN-100 column. 0. 5-4.0 fAg of the isolated 
plasmid was used to transform Bacillus subtilis DB428 as 
described previously (88). 

About 400 transformants from the Bacillus subtilis 
DB428 library were subjected to screening. Screening was 
performed using the assay described previously (88), on 
succinyl-Ala-Ala-Pro-Phe-p-nitro anilide. B. subtilis DB428 
containing the plasmid library were grown on LB plates 
containing kanamycin (20 //g/ml) plates. After 18 hours at 
37° C. single colonies were picked into 96-well plates 
containing 200 SG/kanamycin medium per well. These 
plates were incubated with shaking at 37° C. for 24 hours to 
let the cells to grow to saturation. The cells were spun down, 
and the supernatants were sampled for the thermostability 
assay. 

Two replicates of 96-well assay plates were prepared for 
each growth plate by transferring 10 jA of supernatant into 
the replica plates. The subtilisin activities were then mea- 
sured by adding 100 of activity assay solution (0.2 mM 
succinyl-Ala-Ala-Pro-Phe-p-nitro anilide, 100 mM Tris- 
HCl, 10 mM CaCl 2 , pH 8.0, 37° C.). Reaction velocities 
were measured at 405 nm to over 1.0 min in a ThermoMax 
microplate reader (Molecular Devices, Sunnyvale Calif.). 
Activity measured at room temperature was used to calcu- 
late the fraction of active clones (clones with activity less 
than 10% of that of wild type were scored as inactive). Initial 
activity (A,-) was measured after incubating one assay plate 
at 65° C. for 10 minutes by immediately adding 100 fA of 
prewarmed (37° C.) assay solution (0.2 mM succinyl-Ala- 
Ala-Pro-Phe-p-nitroanilide, 100 mM Tris-HCl, pH 8.0, 10 
mM CaCl 2 , pH 8.0) into each well. Residual activity (Ar) 
was measured after 40 minute incubation. 

B. Results 

In vitro heteroduplex formation and in vivo repair was 
carried out as described above. Five clones from E. coli 
SCSllO mutant library and five from E. coli ES1301 mutS 
libraries were selected at random and sequenced. FIG. 14 
shows that four out of the ten clones were different from the 
parent genes. The frequency of occurrence of a particular 
point mutation from parent RCl or RC2 in the resulting 
genes ranged from 0% to 50%, and the ten point mutations 
in the heteroduplex have been repaired without strong 
strand-specific preference. 

Since none of the ten mutations locates within the dcm 
site, the mismatch repair appears generally done via the E. 
coli long-patch mismatch repair systems. The system repairs 
different mismatches in a strand-specific manner using the 
state of N6-methylation of adenine in GATC sequences as 
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the major mechanism for determining the strand to be 
repaired. With heteroduplexes methylated at GATC 
sequences on only one DNA strand, repair was shown to be 
highly biased to the unmethylated strand, with the methy- 
5 lated strand serving as the template for correction. If neither 
strand was methylated, mismatch repair occurred, but 
showed little strand preference (23, 24). These results shows 
that it is preferable to demethylate the DNA to be recom- 
bined to promote efficient and random repair of the hetero- 
10 duplexes. 

The rates of subtilisin E thermo-inactivation at 65° C. 
were estimated by analyzing the 400 random clones from the 
Bacillus subtilis DB428 library. The thermostabilities 
obtained from one 96-well plate are shown in FIG. 16 , 
15 plotted in descending order. About 12.9% of the clones 
exhibited thermostability comparable to the mutant with the 
N181D and N218S double mutations. Since this rate is only 
half of that expected for random recombination of these two 
markers, it indicates that the two mismatches at positions 
20 995 and 1107 within the heteroduplexes have been repaired 
with lower position randomness. 

Sequence analysis of the clone exhibiting the highest 
thermostability among the screened 400 transformants from 
the E. coli SCSllO heteroduplex library confirmed the 
25 presence of both N181D and N218S mutations. Among the 
400 transformants from the B.sublilis DB428 library that 
were screened, approximately 91% of the clones expressed 
N18ID- and/or N218S-type enzyme stabilities, while about 
8.0% of the transformants showed only wild-type subtilisin 
30 E stability. 

Less than 1.0% inactive clone was found, indicating that 
few new point mutations were introduced in the recombi- 
nation process. This is consistent with the fact that no new 
point mutations were identified in the ten sequenced genes 
35 (FIG. 14 ). While point mutations may provide useful diver- 
sity for some in vitro evolution applications, they can also be 
problematic for recombination of beneficial mutations, espe- 
cially when the mutation rate is high. 

40 Example 4 

Optimizing Conditions for the Heteroduplex 
Recombination. 

We have found that the efficiency of heteroduplex recom- 
45 bination can differ considerably from gene to gene [17,57]. 
In this example, we investigate and optimize a variety of 
parameters that improve recombination efficiency. DNA 
substrates used in this example were site-directed mutants of 
green fiuorescent protein homAequorea victoria. The GFP 
50 mutants had a stop codon(s) introduced at different locations 
along the sequence that abolished their fluorescence. Fluo- 
rescent wild type protein could be only restored by recom- 
bination between two or more mutations. Fraction of fluo- 
rescent colonies was used as a measure of recombination 
55 efficiency. 

A. Methods 

About 2-4 jAg of each parent plasmid was used in one 
recombination experiment. One parent plasmid was digested 
with Pst I endonuclease another parent with EcoRI. Linear- 
60 ized plasmids were mixed together and 20xSSPE buffer was 
added to the final concentration lx(180 mM NaCl, 1 mM 
EDTA, 10 mM NaH 2 P 04 , pH 7.4). The reaction mixture was 
heated at 96° C. for 4 minutes, immediately transferred on 
ice for 4 minutes and the incubation was continued for 2 
65 hours at 68° C. 

Target genes were amplified in a PCR reaction with 
primers corresponding to the vector sequence of pGFP 
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plasmid. Forward primer: 5'-CCGACTGGAAAGC 
GGGCAGTG-3', reverse primer 5'-CGGGGCTGGCTT 
AACTATGCGG-3'. PCR products were mixed together and 
purified using Qiagen PCR purification kit. Purified products 
were mixed with 20xSSPE buffer and hybridized as 5 
described above. Annealed products were precipitated with 
ethanol or purified on Qiagen columns and digested with 
EcoRI and PstI enzymes. Digested products were ligated 
into PstI and EcoRI digested pGFP vector. 

dUTP was added into PCR reaction at final concentrations lo 
200 fjM, 40 jjM, 8 jjM, 1.6 jjM, 0.32 jjM. PCR reaction and 
subsequent cloning procedures were performed as described 
above. 

Recombinant plasmids were transformed into XLIO E. 
coli strain by a modified chemical transformation method. 
Cells were plated on ampicillin containing LB agar plates 
and grown overnight at 37° C., followed by incubation at 
room temperature or at 4° C. until fiuorescence developed. 

B. Results. 

1. Effect of Ligation on Recombination Efficiency. 

Two experiments have been performed to test the effect of 

breaks in the DNA heteroduplex on the efficiency of recom- 
bination. In one experiment heteroduplex plasmid was 
treated with DNA ligase to close all existing single-strand 
breaks and was transformed in identical conditions as an 
unligated sample (see Table 1). The ligated samples show up 
to 7-fold improvement in recombination efficiency over 
unligated samples. 

In another experiment, dUTP was added into PCR reac- 
tion to introduce additional breaks into DNA upon repair by 
uracyl N-glycosylase in the host cells. Table 2 shows that 
dUMP incorporation significantly suppressed 
recombination, the extent of suppression increasing with 
increased dUTP concentration. 

or 

2. Effect of Plasmid Size on the Efficiency of Heterodu- 
plex Formation. 

Plasmid size was a significant factor affecting recombi- 
nation efficiency. Two plasmids pGFP (3.3 kb) and a Bacil- 
lus shuttle vector pCTl (about 9 kb) were used in preparing 
circular heteroduplex-like plasmids following traditional 
heteroduplex protocol. For the purpose of this experiment 
(to study the effect of plasmid size on duplex formation), 
both parents had the same sequences. While pGFP formed 
about 30-40% of circular plasmid, the shuttle vector yielded 
less than 10% of this form. 

Increase in plasmid size decreases concentration of the 
ends in the vicinity of each and makes annealing of very 
long (>0.8 kb) ends that are single -stranded more difficult. 
This difficulty is avoided by the procedure shown in FIG. 3, 
in which heteroduplex formation occurs between substrates 
in vector-free form, and, heteroduplexes are subsequently 
inserted into a vector. 

3. Efficiency of Recombination vs. Distance Between 
Mutations 

A series of GFP variants was recombined pairwise to 
study the effect of distance between mutations on the effi- 
ciency of recombination. Parental genes were amplified by 
PCR, annealed and ligated back into pGFP vector. Hetero- 
duplexes were transformed into XLIO E. coli strain. 

The first three columns in Table 3 show the results of three 
independent experiments and demonstrate the dependence 
of recombination efficiency on the distance between muta- 
tions. As expected recombination becomes less and less 
efficient for very close mutations. 

However, it is still remarkable that long-patch repair has 
been able to recombine mutations separated by only 27 bp. 
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The last line in Table 3 represents recombination between 
one single and one double mutants. Wild type GFP could 
only be restored in the event of double crossover with each 
individual crossover occurring in the distance of 99 bp only, 
demonstrating the ability of this method to recombine 
multiple, closely-spaced mutations. 

4. Elimination of the Parental Double Strands From 
Heteroduplex Preparations. 

Annealing of substrates in vector-free form offers size- 
advantages relative to annealing of substrates as components 
of vectors, but does not allow selection for heteroduplexes 
relative to homoduplexes simply by transformation into 
host. Asymmetric PCR reactions with only one primer for 
each parent seeded with appropriate amount of previously 
15 amplified and purified gene fragment were run for 100 
cycles, ensuring a 100-fold excess of one strand over 
another. Products of these asymmetrical reactions were 
mixed and annealed together producing only a minor 
amount of nonrecombinant duplexes. The last column in 
20 Table 3 shows the recombination efficiency obtained from 
these enriched heteroduplexes. Comparison of the first three 
columns with the fourth one demonstrates the improvement 
achieved by asymmetric synthesis of the parental strands. 

While the foregoing invention has been described in some 
25 detail for purposes of clarity and understanding, it will be 
clear to one skilled in the art from a reading of this disclosure 
that various changes in form and detail can be made without 
departing from the true scope of the invention. All publica- 
tions and patent documents cited in this application are 
30 incorporated by reference in their entirety for all purposes to 
the same extent as if each individual publication or patent 
document were so individually denoted. 
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SEQUENCE LISTING 


<160> NUMBER OF SEQ ID NOS: 11 

<210> SEQ ID NO 1 
<211> LENGTH: 410 
<212> TYPE: PRT 

<213> ORGANISM: Rhizobium lupini 
<220> FEATURE: 

<223> OTHER INFORMATION: flags 11 in A (FlaA) 


<400> SEQUENCE: 

1 












Met 

Ala 

Ser 

Val 

Leu 

Thr 

Asn 

He 

Asn 

Ala 

Met 

Ser 

Ala 

Leu 

Gin 

Thr 

1 




5 





10 





15 


Leu 

Arg 

Ser 

He 

Ser 

Ser 

Asn 

Met 

Glu 

Asp 

Thr 

Gin 

Ser 

Arg 

He 

Ser 




20 





25 





30 



Ser 

Gly 

Met 

Arg 

Val 

Gly 

Ser 

Ala 

Ser 

Asp 

Asn 

Ala 

Ala 

Tyr 

Trp 

Ser 



35 





40 





45 




He 

Ala 

Thr 

Thr 

Met 

Arg 

Ser 

Asp 

Asn 

Ala 

Ser 

Leu 

Ser 

Ala 

Val 

Gin 


50 





55 





60 





Asp 

Ala 

He 

Gly 

Leu 

Gly 

Ala 

Ala 

Lys 

Val 

Asp 

Thr 

Ala 

Ser 

Ala 

Gly 

65 





70 





75 





80 

Met 

Asp 

Ala 

Val 

He 

Asp 

Val 

Val 

Lys 

Gin 

He 

Lys 

Asn 

Lys 

Leu 

Val 





85 





90 





95 


Thr 

Ala 

Gin 

Glu 

Ser 

Ser 

Ala 

Asp 

Lys 

Thr 

Lys 

He 

Gin Gly 

Glu 

Val 




100 





105 





110 



Lys 

Gin 

Leu 

Gin 

Glu 

Gin 

Leu 

Lys 

Gly 

He 

Val 

Asp 

Ser 

Ala 

Ser 

Phe 



115 





120 





125 




Ser 

Gly 

Glu 

Asn 

Trp 

Leu 

Lys 

Gly 

Asp 

Leu 

Ser 

Thr 

Thr 

Thr 

Thr 

Lys 


130 





135 





140 





Ser 

Val 

Val 

Gly 

Ser 

Phe 

Val 

Arg 

Glu 

Gly 

Gly 

Thr 

Val 

Ser 

Val 

Lys 

145 





150 





155 





160 

Thr 

He 

Asp 

Tyr 

Ala 

Leu 

Asn 

Ala 

Ser 

Lys 

Val 

Leu 

Val 

Asp 

Thr 

Arg 





165 





170 





175 


Ala 

Thr 

Gly 

Thr 

Lys 

Thr 

Gly 

He 

Leu 

Asp 

Thr 

Ala 

Tyr 

Thr 

Gly 

Leu 




180 





185 





190 



Asn 

Ala 

Asn 

Thr 

Val 

Thr 

Val 

Asp 

He 

Asn 

Lys 

Gly 

Gly Val 

He 

Thr 



195 





200 





205 




Gin 

Ala 

Ser 

Val 

Arg 

Ala 

Tyr 

Ser 

Thr 

Asp 

Glu 

Met 

Leu 

Ser 

Leu 

Gly 


210 





215 





220 





Ala 

Lys 

Val 

Asp 

Gly 

Ala 

Asn 

Ser 

Asn 

Val 

Ala 

Val 

Gly Gly 

Gly 

Ser 

225 





230 





235 





240 

Ala 

Ser 

Ser 

Arg 

Ser 

Thr 

Ala 

Ala 

Gly 

Leu 

Arg 

Val 

Ala 

Ser 

Thr 

Leu 





245 





250 





255 



Arg Pro Pro Ser Pro His Gin His Gin Ser Leu Ala Ser Leu Pro Pro 




31 
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- 

continued 




260 





265 





270 



Leu Thr 

Pro 

Pro 

Leu 

Lys 

Leu 

Val 

Leu 

Gin 

Leu 

Leu 

Pro 

Val 

Thr 

Pro 


275 





280 





285 




Ser Ser 

Ser 

Thr 

Lys 

Pro 

Thr 

Ala 

Ala 

Pro 

Val 

Gin 

Val 

Asn 

Leu 

Thr 

290 





295 





300 





Gin Ser 

Val 

Leu 

Thr 

Met 

Asp 

Val 

Ser 

Ser 

Met 

Ser 

Ser 

Thr 

Asp 

Val 

305 




310 





315 





320 

Gly Ser 

Tyr 

Leu 

Thr 

Gly 

Val 

Glu 

Lys 

Ala 

Leu 

Thr 

Ser 

Leu 

Thr 

Ser 




325 





330 





335 


Ala Gly 

Ala 

Glu 

Leu 

Gly 

Ser 

He 

Lys 

Gin 

Arg 

He 

Asp 

Leu 

Gin 

Val 



340 





345 





350 



Asp Phe 

Ala 

Ser 

Lys 

Leu 

Gly 

Asp 

Ala 

Leu 

Ala 

Lys 

Gly 

He 

Gly Arg 


355 





360 





365 




Leu Val 

Asp 

Ala 

Asp 

Met 

Asn 

Glu 

Glu 

Ser 

Thr 

Lys 

Leu 

Lys 

Ala 

Leu 

370 





375 





380 





Gin Thr 

Gin 

Gin 

Gin 

Leu 

Ala 

He 

Gin 

Ser 

Leu 

Ser 

He 

Ala 

Asn 

Ser 

385 




390 





395 





400 

Asp Ser 

Gin 

Asn 

He 

Leu 

Ser 

Leu 

Phe 

Arg 










405 





410 







<210> SEQ ID 

1 NO 

2 












<211> LENGTH 

:: 394 












<212> TYPE: 

PRT 













<213> ORGANISM: 

Rhizobium meliloti 








<220> FEATURE: 













<223> OTHER 

INFORMATION : 

; flagellin A 

(FlaA) 






<400> SEQUENCE: 

2 












Met Thr 

Ser 

He 

Leu 

Thr 

Asn 

Asn 

Ser 

Ala 

Met 

Ala 

Ala 

Leu 

Ser 

Thr 

1 



5 





10 





15 


Leu Arg 

Ser 

He 

Ser 

Ser 

Ser 

Met 

Glu 

Asp 

Thr 

Gin 

Ser 

Arg 

He 

Ser 



20 





25 





30 



Ser Gly 

Leu 

Arg 

Val 

Gly 

Ser 

Ala 

Ser 

Asp 

Asn 

Ala 

Ala 

Tyr 

Trp 

Ser 


35 





40 





45 




lie Ala 

Thr 

Thr 

Met 

Arg 

Ser 

Asp 

Asn 

Gin 

Ala 

Leu 

Ser 

Ala 

Val 

Gin 

50 





55 





60 





Asp Ala 

Leu 

Gly 

Leu 

Gly Ala 

Ala 

Lys 

Val 

Asp 

Thr 

Ala 

Tyr 

Ser 

Gly 

65 




70 





75 





80 

Met Glu 

Ser 

Ala 

He 

Glu 

Val 

Val 

Lys 

Glu 

He 

Lys 

Ala 

Lys 

Leu 

Val 




85 





90 





95 


Ala Ala 

Thr 

Glu 

Asp 

Gly Val 

Asp 

Lys 

Ala 

Lys 

He 

Gin 

Glu 

Glu 

He 



100 





105 





110 



Thr Gin 

Leu 

Lys 

Asp 

Gin 

Leu 

Thr 

Ser 

He 

Ala 

Glu 

Ala 

Ala 

Ser 

Phe 


115 





120 





125 




Ser Gly 

Glu 

Asn 

Trp 

Leu 

Gin 

Ala 

Asp 

Leu 

Ser 

Gly 

Gly 

Pro 

Val 

Thr 

130 





135 





140 





Lys Ser 

Val 

Val 

Gly 

Gly Phe 

Val 

Arg 

Asp 

Ser 

Ser 

Gly Ala 

Val 

Ser 

145 




150 





155 





160 

Val Lys 

Lys 

Val 

Asp 

Tyr 

Ser 

Leu 

Asn 

Thr 

Asp 

Thr 

Val 

Leu 

Phe 

Asp 




165 





170 





175 


Thr Thr 

Gly 

Asn 

Thr 

Gly 

He 

Leu 

Asp 

Lys 

Val 

Tyr 

Asn 

Val 

Ser 

Gin 



180 





185 





190 



Ala Ser 

Val 

Thr 

Leu 

Pro 

Val 

Asn 

Val 

Asn 

Gly 

Thr 

Thr 

Ser 

Glu 

Tyr 


195 





200 





205 





Thr Val Gly Ala Tyr Asn Val Asp Asp Leu lie Asp Ala Ser Ala Thr 
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34 


-continued 


210 215 220 


Phe 

Asp 

Gly 

Asp 

Tyr 

Ala 

Asn 

Val 

>1 
1 — 1 
c 

Ala 

Gly 

Ala 

Leu 

Ala 

Gly 

Asp 

225 





230 





235 





240 

Tyr 

Val 

Lys 

Val 

Gin 

Gly 

Ser 

Trp 

Val 

Lys 

Ala 

Val 

Asp 

Val 

Ala 

Ala 





245 





250 





255 


Thr Gly 

Gin 

Glu 

Val 

Val 

Tyr 

Asp 

Asp 

Gly 

Thr 

Thr 

Lys 

Trp 

Gly 

Val 




260 





265 





270 



Asp 

Thr 

Thr 

Val 

Thr 

Gly 

Ala 

Pro 

Ala 

Thr 

Asn 

Val 

Ala 

Ala 

Pro 

Ala 



275 





280 





285 




Ser 

He 

Ala 

Thr 

He 

Asp 

He 

Thr 

He 

Ala 

Ala 

Gin 

Ala 

Gly 

Asn 

Leu 


290 





295 





300 





Asp 

Ala 

Leu 

He 

Ala 

Gly 

Val 

Asp 

Glu 

Ala 

Leu 

Thr 

Asp 

Met 

Thr 

Ser 

305 





310 





315 





320 

Ala 

Ala 

Ala 

Ser 

Leu 

Gly 

Ser 

He 

Ser 

Ser 

Arg 

He 

Asp 

Leu 

Gin 

Ser 





325 





330 





335 


Asp 

Phe 

Val 

Asn 

Lys 

Leu 

Ser 

Asp 

Ser 

He 

Asp 

Ser 

Gly 

Val 

Gly 

Arg 




340 





345 





350 



Leu 

Val 

Asp 

Ala 

Asp 

Met 

Asn 

Glu 

Glu 

Ser 

Thr 

Arg 

Leu 

Lys 

Ala 

Leu 



355 





360 





365 




Gin 

Thr 

Gin 

Gin 

Gin 

Leu 

Ala 

He 

Gin 

Ala 

Leu 

Ser 

He 

Ala 

Asn 

Ser 


370 





375 





380 





Asp 

Ser 

Gin 

Asn 

Val 

Leu 

Ser 

Leu 

Phe 

Arg 







385 





390 












<210> SEQ ID NO 3 
<211> LENGTH: 1201 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence : SCSO 1 mosaic 
flaA gene created by in vitro heteroduplex 
formation followed by in vivo repair 

<400> SEQUENCE: 3 


atggcaagcg 

ttctcacaaa 

cattaacgca 

atgtctgctc 

ttcagacgct 

gcgttcgatt 

60 

tcttccaaca 

tggaagacac 

ccagagccgt 

atttccagcg 

gcatgcgcgt 

tggttcggct 

120 

tccgacaacg 

ccgcttattg 

gtctatcgcg 

accaccatgc 

gctcggacaa 

tgcctcgctt 

180 

tccgctgttc 

aggatgcaat 

tggcctcggt 

gccgccaagg 

tcgataccgc 

ttcggcgggt 

240 

atggatgcgg 

ttatcgatgt 

tgtaaagcag 

atcaagaaca 

aactggtcac 

tgccaccgaa 

300 

gacggcgtcg 

acaaggccaa 

gatccaagaa 

gaaatcactc 

agctcaagga 

ccagctgacg 

360 

agcatcgccg 

acgcggcttc 

cttctccggt 

gaaaactggc 

tcaagggcga 

tctttccacg 

420 

acgacaacca 

aatcagtggt 

tggctccttc 

gttcgtgaag 

gcggtaccgt 

atcggtcaag 

480 

accatcgatt 

acgctctgaa 

tgcttccaag 

gttctggtgg 

atacccgcgc 

aacgggcacc 

540 

aagaccggca 

ttctggacaa 

ggtctacaac 

gtctcgcagg 

caagcgtcac 

gctgacggtc 

600 

aacaccaacg 

gcgtcgaatc 

ccaggcctcc 

gtccgcgcct 

attcgctgga 

gtccctcacc 

660 

gaagccggtg 

cggagttcca 

gggcaactat 

gctcttcagg 

gcggtaacag 

ctacgtcaag 

720 

gtcgaaaacg 

tctgggttcg 

agctgagacc 

gcatcaacac 

cagtcgctgg 

caagtttgcc 

780 

gccgcttaca 

ccgccgctga 

agctggtact 

gcagctgctg 

ccggtgacgc 

catcatcgtc 

840 

gacgaaacca 

acagcggcgc 

cggtgcaggt 

aaacctcacc 

cagtcggtcc 

tgaccatgga 

900 

tgtcagctcg 

atgagctcga 

cggatgtcgg 

cagctacctc 

acgggcgtgg 

aaaaggctct 

960 
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-continued 

caccagcctg acgagcgctg gcgctgaact cggctctatc aaacagcgca tcgatctgca 1020 

ggttgatttt gcttccaagc tgggcgacgc tctcgcaaaa ggtattggcc gtctcgttga 1080 

tgctgacatg aatgaagagt ccactaagct taaggctctt cagacgcagc agcagctggc 1140 

tatccagtcg ctctccatcg caaacagcga ctcgcagaac attctgtcgc tgttccgtta 1200 

a 1201 


<210> SEQ ID NO 4 
<211> LENGTH: 1229 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence : SCS02 mosaic 
flaA gene created by in vitro heteroduplex 
formation followed by in vivo repair 

<400> SEQUENCE: 4 


atgacgagca 

ttctcaccaa 

caactccgca 

atggccgcgc 

tttccggagt 

gcgctcgatc 

60 

tcttccagca 

tggaagacac 

gcagagccgc 

atctcctccg 

gccttcgcgt 

cggttcggcc 

120 

tccgacaacg 

ccgcctactg 

gtcgattgcg 

accaccatgc 

gctccgacaa 

ccaggccctt 

180 

tcggccgtcc 

aggacgccct 

cggcctcggc 

gccgccaagg 

ttgataccgc 

ctattccggt 

240 

atggaatcgg 

cgatcgaagt 

cgttaaggaa 

atcaagaaca 

aactggtcac 

tgctcaggaa 

300 

tcttctgccg 

acaaaacgaa 

gattcagggc 

gaagtcaagc 

agcttcagga 

gcagttgaag 

360 

ggcatcgttg 

attccgcttc 

cttctccggt 

gagaactggc 

tgcaggcgga 

cctcagcggc 

420 

ggcgccgtca 

ccaagagcgt 

cgtcggctcg 

ttcgtccgtg 

acggaagcgg 

ttccgtagcc 

480 

gtcaagaagg 

tcgattacgc 

tctgaatgct 

tccaaggttc 

tggtggatac 

ccgcgcaacg 

540 

ggcaccaaga 

ccggcattct 

cgatactgct 

tataccggcc 

ttaacgcgaa 

cacggtgacg 

600 

gttgatatca 

acaagggcgg 

cgtgatcacc 

caggcctccg 

tccgcgccta 

ttccacggac 

660 

gaaatgctct 

ccctcggcgc 

aaaggtcgat 

ggcgcaaaca 

gcaacgttgc 

tgttggcggc 

720 

ggctccgctt 

cgtcaaggtc 

gacggcagct 

gggttaaggg 

tagcgtcgac 

gctgcggcct 

780 

ccatcaccgc 

atcaaccggc 

gccaccggtc 

aagaaatcgc 

cgccaccacg 

acggcagctg 

840 

gtaccatcac 

tgcagacagc 

tgggtcgtcg 

atgtcggcaa 

cgctcctgcc 

gccaacgttt 

900 

cggccggcca 

gtcggtcgcg 

aacatcaaca 

tcgtcggaat 

gggctcgacg 

gatgtcggca 

960 

gctacctcac 

gggcgtggaa 

aaggctctca 

ccagcatgac 

cagcgctgcc 

gcctcgctcg 

1020 

gctccatctc 

ctcgcgcatc 

gacctgcaga 

gcgaattcgt 

caacaagctc 

tcggactcga 

1080 

tcgagtcggg 

cgtcggccgt 

ctcgtcgacg 

cggacatgaa 

cgaggagtcg 

acccgcctca 

1140 

aggccctgca 

gacccagcag 

cagctcgcca 

tccaggccct 

gtcgatcgcc 

aactcggact 

1200 

cgcagaacgt 

cctgtcgctc 

ttccgctaa 




1229 


<210> SEQ ID NO 5 
<211> LENGTH: 1228 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence :ES0 1 mosaic 
flaA gene created by in vitro heteroduplex 
formation followed by in vivo repair 

<400> SEQUENCE: 5 

atgacgagca ttctcaccaa caactccgca atggccgcgc tttccggagt gcgctcgatc 60 

tcttccagca tggaagacac gcagagccgc atctcctccg gccttcgcgt cggttcggcc 120 
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-continued 


tccgacaacg 

ccgcctactg 

gtcgattgcg 

accaccatgc 

gctccgacaa 

ccaggccctt 

180 

tcggccgtcc 

aggacgccct 

cggcctcggc 

gccgccaagg 

ttgataccgc 

ctattccggt 

240 

atggaatcgg 

cgatcgaagt 

cgttaaggaa 

atcaaggcca 

agctcgtagc 

tgccaccgaa 

300 

gacggcgtcg 

acaaggccaa 

gatccaagaa 

gaaatcactc 

agctcaagga 

ccagctgacg 

360 

agcatcgccg 

acgcggcttc 

cttctccggt 

gagaactggc 

tgcaggcgga 

cctcagcggc 

420 

ggcgccgtca 

ccaagagcgt 

cgtcggctcg 

ttcgtccgtg 

acggaagcgg 

ttccgtagcc 

480 

gtcaagacca 

tcgattacgc 

tctgaatgct 

tccaaggttc 

tggtggatac 

ccgcgacacg 

540 

gtcggcgata 

ccggcattct 

ggacaaggtc 

tacaacgtct 

cgcaggcaag 

cgtcacgctg 

600 

acggtcaaca 

ccaacggcgt 

cgaatcgcag 

catacggttg 

ctgcctattc 

gctggagtcc 

660 

ctcaccgaag 

ccggtgcgga 

gttccagggc 

aactatgctc 

ttcagggcgg 

taacagctac 

720 

gtcaaggtcg 

acggcagctg 

ggttaagggt 

agcgtcgacg 

ctgcggcctc 

catcaccgca 

780 

tcaacaccag 

tcgctggcaa 

gtttgccgcc 

gcttacaccg 

ccgctgaagc 

tggtactgca 

840 

gctgctgccg 

gtgacgccat 

catcgtcgac 

gaaaccaaca 

gcggcgccgg 

tgcaggtaaa 

900 

cctcacccag 

tcggtcctga 

ccatggatgt 

cagctcgatg 

agctcgacgg 

atgtcggcag 

960 

ctacctcacg 

ggcgtggaaa 

aggctctcac 

cagcctgacg 

agcgctggcg 

ctgaactcgg 

1020 

ctccatctcc 

tcgcgcatcg 

acctgcagag 

cgaattcgtc 

aacaagctct 

cggactcgat 

1080 

cgagtcgggc 

gtcggccgtc 

tcgtcgacgc 

ggacatgaac 

gaggagtcga 

cccgcctcaa 

1140 

ggccctgcag 

acccagcagc 

agctcgccat 

ccaggccctg 

tcgatcgcca 

actcggactc 

1200 

gcagaacgtc 

ctgtcgctct 

tccgctaa 




1228 


<210> SEQ ID NO 6 
<211> LENGTH: 1209 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence :ES02 mosaic 
flaA gene created by in vitro heteroduplex 
formation followed by in vivo repair 

<400> SEQUENCE: 6 


atgacgagca 

ttctcaccaa 

caactccgca 

atggccgcgc 

tttccggagt 

gcgctcgatc 

60 

tcttccagca 

tggaagacac 

gcagagccgc 

atctcctccg 

gccttcgcgt 

cggttcggcc 

120 

tccgacaacg 

ccgcctactg 

gtcgattgcg 

accaccatgc 

gctccgacaa 

ccaggccctt 

180 

tcggccgtcc 

aggacgccct 

cggcctcggc 

gccgccaagg 

ttgataccgc 

ctattccggt 

240 

atggaatcgg 

cgatcgaagt 

cgttaaggaa 

atcaaggcca 

agctcgtagc 

tgccaccgaa 

300 

gacggcgtcg 

acaaggccaa 

gatccaagaa 

gaaatcactc 

agctcaagga 

ccagctgacg 

360 

agcatcgccg 

acgcggcttc 

cttctccggt 

gagaactggc 

tgcaggcgga 

cctcagcggc 

420 

ggcgccgtca 

ccaagagcgt 

cgtcggctcg 

ttcgtccgtg 

acggaagcgg 

ttccgtagcc 

480 

gtcaagacca 

tcgattacgc 

tctgaatgct 

tccaaggttc 

tggtggatac 

ccgcgcaacg 

540 

ggcaccaaga 

ccggcattct 

cgatactgct 

tataccggcc 

ttaacgcgaa 

cacggtgacg 

600 

gttgatatca 

acaagggcgg 

cgtgatcacc 

caggcctccg 

tccgcgccta 

ttccacggac 

660 

gaaatgctct 

ccctcaccga 

agccggtgcg 

gagttccagg 

gcaactatgc 

tcttcagggc 

720 

ggtaacagct 

acgtcaaggt 

cgaaaacgtc 

tgggttcgag 

ctgagaccgc 

tgcaaccggc 

780 

gccaccggtc 

aagaaatcgc 

cgccaccacg 

acggcagctg 

gtaccatcac 

tgcagacagc 

840 
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-continued 

tgggtcgtcg atgtcggcaa cgctcctgcc gccaacgttt cggccggcca gtcggtcgcg 900 

aacatcaaca tcgtcggaat gggtgcagct gcgctcgatg ccctgatcag cggtgtcgac 960 

gccgctttga cagacatgac cagcgctgcc gcctcgctcg gctccatctc ctcgcgcatc 1020 

gacctgcaga gcgaattcgt caacaagctc tcggactcga tcgagtcggg cgtcggccgt 1080 

ctcgtcgacg cggacatgaa cgaggagtcg acccgcctca aggccctgca gacccagcag 1140 

cagctcgcca tccaggccct gtcgatcgcc aactcggact cgcagaacgt cctgtcgctc 1200 

ttccgctaa 1209 

<210> SEQ ID NO 7 
<211> LENGTH: 4039 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence :Actinoplanes 
utahensis echinocandin B (ECB) deacylase gene 
mutant M-15 created by in vitro heteroduplex 
formation followed by in vivo repair 
<22 1> NAME /KEY: CDS 
<2 22> LOCATION: ( 1196 )..( 3559 ) 

<400> SEQUENCE: 7 


ctgcagcgtg 

cccagctgtt 

cgtggtggtg 

atcgcggccg 

cgctggccgc 

cgtcgcggtc 

60 

gccgccgccg 

ggeegatega 

gttcgtcgcc 

ttcgtcgtgc 

cgcagatcgc 

cctgcggctc 

120 

tgeggeggea 

gccggccgcc 

cctgctcgcc 

teggegatge 

tcggcgcgct 

gctggtggtc 

180 

ggcgccgacc 

tggtcgctca 

gatcgtggtg 

gegeegaagg 

agctgccggt 

cggcctgctc 

240 

acegegatga 

tcggcacccc 

gtacctgctc 

tggctcctgc 

tteggegate 

aagaaaggtg 

300 

ageggatgaa 

cgcccgcctg 

cgtggcgagg 

gcctgcacct 

egegtaeggg 

gacctgaccg 

360 

tgategaegg 

cctcgacgtc 

gacgtgcacg 

acgggctggt 

caccaccatc 

atcgggccca 

420 

acgggtgcgg 

caagtegaeg 

ctgctcaagg 

cgctcggccg 

gctgctgcgc 

ccgaccggcg 

480 

ggcaggtgct 

gctggacggc 

cgccgcatcg 

accggacccc 

cacccgtgac 

gtggcccggg 

540 

tgctcggcgt 

gctgccgcag 

tcgcccaccg 

cgcccgaagg 

gctcaccgtc 

gccgacctgg 

600 

tgatgegegg 

ccggcacccg 

caccagacct 

ggttccggca 

gtggtcgcgc 

gaegaegagg 

660 

accaggtcgc 

cgacgcgctg 

cgctggaccg 

acatgctggc 

gtaegeggae 

cgcccggtgg 

720 

acgccctctc 

cggcggtcag 

cgccagcgcg 

cctggatcag 

catggcgctg 

gcccagggca 

780 

ccgacctgct 

gctgctggac 

gagccgacca 

ccttcctcga 

cctggcccac 

cagategaeg 

840 

tgctggacct 

ggtccgccgg 

ctgcacgccg 

agatgggccg 

gaccgtggtg 

atggtgctgc 

900 

acgacctgag 

cctggccgcc 

cggtacgccg 

aceggetgat 

egegatgaag 

gaeggeegga 

960 

tcgtggcgag 

cggggcgccg 

gacgaggtgc 

tcaccccggc 

gctgctggag 

teggtetteg 

1020 

ggctgcgcgc 

gatggtggtg 

cccgacccgg 

cgaccggcac 

cccgctggtg 

atccccctgc 

1080 

cgcgccccgc 

cacctcggtg 

cgggcctgaa 

ategatgage 

gtggttgctt 

catcggcctg 

1140 

ccgagcgatg 

agagtatgtg 

ggcggtagag 

egagtetega 

gggggagatg 

ccgcc gtg 

1198 


Val 

1 


acg tcc teg tac atg ege ctg aaa gca gca geg ate gcc ttc ggt gtg 1246 

Thr Ser Ser Tyr Met Arg Leu Lys Ala Ala Ala lie Ala Phe Gly Val 

5 10 15 

ate gtg geg acc gca gcc gtg ccg tea ccc get tcc ggc agg gaa cat 

He Val Ala Thr Ala Ala Val Pro Ser Pro Ala Ser Gly Arg Glu His 

20 25 30 


1294 
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-continued 


gac 

ggc 

ggc 

tat 

geg 

gcc 

ctg 

ate 

ege 

egg 

gcc 

teg 

tac 

ggc 

gtc 

ccg 

1342 

Asp Gly 

Gly 

Tyr 

Ala 

Ala 

Leu 

He 

Arg 

Arg 

Ala 

Ser 

Tyr 

Gly 

Val 

Pro 



35 





40 





45 






cac 

ate 

acc 

gcc 

gac 

gac 

ttc 

ggg 

age 

etc 

ggt 

ttc 

ggc 

gtc 

ggg 

tac 

1390 

His 

He 

Thr 

Ala 

Asp 

Asp 

Phe 

Gly 

Ser 

Leu 

Gly 

Phe 

Gly 

Val 

Gly 

Tyr 


50 





55 





60 





65 


gtg 

cag 

gcc 

gag 

gac 

aac 

ate 

tgc 

gtc 

ate 

gcc 

gag 

age 

gta 

gtg 

aeg 

1438 

Val 

Gin 

Ala 

Glu 

Asp 

Asn 

He 

Cys 

Val 

He 

Ala 

Glu 

Ser 

Val 

Val 

Thr 






70 





75 





80 



gcc 

aac 

ggt 

gag 

egg 

teg 

egg 

tgg 

ttc 

ggt 

geg 

acc 

ggg 

ccg 

gac 

gac 

1486 

Ala 

Asn 

Gly 

Glu 

Arg 

Ser 

Arg 

Trp 

Phe 

Gly 

Ala 

Thr 

Gly 

Pro 

Asp 

Asp 





85 





90 





95 




gcc 

gat 

gtg 

ege 

age 

gac 

etc 

ttc 

cac 

ege 

aag 

geg 

ate 

gac 

gac 

ege 

1534 

Ala 

Asp 

Val 

Arg 

Ser 

Asp 

Leu 

Phe 

His 

Arg 

Lys 

Ala 

He 

Asp 

Asp 

Arg 




100 





105 





110 





gtc 

gcc 

gag 

egg 

etc 

etc 

gaa 

ggg 

ccc 

ege 

gac 

ggc 

gtg 

egg 

geg 

ccg 

1582 

Val 

Ala 

Glu 

Arg 

Leu 

Leu 

Glu 

Gly 

Pro 

Arg 

Asp 

Gly 

Val 

Arg 

Ala 

Pro 



115 





120 





125 






teg 

gac 

gac 

gtc 

egg 

gac 

cag 

atg 

ege 

ggc 

ttc 

gtc 

gcc 

ggc 

tac 

aac 

1630 

Ser 

Asp 

Asp 

Val 

Arg 

Asp 

Gin 

Met 

Arg 

Gly 

Phe 

Val 

Ala 

Gly 

Tyr 

Asn 


130 





135 





140 





145 


cac 

ttc 

eta 

ege 

ege 

acc 

ggc 

gtg 

cac 

ege 

ctg 

acc 

gac 

ccg 

geg 

tgc 

1678 

His 

Phe 

Leu 

Arg 

Arg 

Thr 

Gly 

Val 

His 

Arg 

Leu 

Thr 

Asp 

Pro 

Ala 

Cys 






150 





155 





160 



ege 

ggc 

aag 

gcc 

tgg 

gtg 

ege 

ccg 

etc 

tee 

gag 

ate 

gat 

etc 

tgg 

cgt 

1726 

Arg Gly 

Lys 

Ala 

Trp 

Val 

Arg 

Pro 

Leu 

Ser 

Glu 

He 

Asp 

Leu 

Trp 

Arg 





165 





170 





175 




aeg 

teg 

tgg 

gac 

age 

atg 

gtc 

egg 

gcc 

ggt 

tee 

ggg 

geg 

ctg 

etc 

gac 

1774 

Thr 

Ser 

Trp 

Asp 

Ser 

Met 

Val 

Arg 

Ala 

Gly 

Ser 

Gly 

Ala 

Leu 

Leu 

Asp 




180 





185 





190 





ggc 

ate 

gtc 

gcc 

geg 

aeg 

cca 

cct 

aca 

gcc 

gcc 

ggg 

ccc 

geg 

tea 

gcc 

1822 

Gly 

He 

Val 

Ala 

Ala 

Thr 

Pro 

Pro 

Thr 

Ala 

Ala 

Gly 

Pro 

Ala 

Ser 

Ala 



195 





200 





205 






ccg 

gag 

gca 

GCC 

gac 

gcc 

gcc 

geg 

ate 

gcc 

gcc 

gcc 

etc 

gac 

ggg 

aeg 

1870 

Pro 

Glu 

Ala 

Pro 

Asp 

Ala 

Ala 

Ala 

He 

Ala 

Ala 

Ala 

Leu 

Asp 

Gly 

Thr 


210 





215 





220 





225 


age 

geg 

ggc 

ate 

ggc 

age 

aac 

geg 

tac 

ggc 

etc 

ggc 

geg 

cag 

gcc 

acc 

1918 

Ser 

Ala 

Gly 

He 

Gly 

Ser 

Asn 

Ala 

Tyr 

Gly 

Leu 

Gly 

Ala 

Gin 

Ala 

Thr 






230 





235 





240 



gtg 

aac 

ggc 

age 

ggg 

atg 

gtg 

ctg 

gcc 

aac 

ccg 

cac 

ttc 

ccg 

tgg 

cag 

1966 

Val 

Asn 

Gly 

Ser 

Gly 

Met 

Val 

Leu 

Ala 

Asn 

Pro 

His 

Phe 

Pro 

Trp 

Gin 





245 





250 





255 




ggc 

gcc 

gca 

ege 

ttc 

tac 

egg 

atg 

cac 

etc 

aag 

gtg 

ccc 

ggc 

ege 

tac 

2014 

Gly Ala 

Ala 

Arg 

Phe 

Tyr 

Arg 

Met 

His 

Leu 

Lys 

Val 

Pro 

Gly 

Arg 

Tyr 




260 





265 





270 





gac 

gtc 

gag 

ggc 

geg 

geg 

ctg 

ate 

ggc 

gac 

ccg 

ate 

ate 

ggg 

ate 

ggg 

2062 

Asp 

Val 

Glu 

Gly 

Ala 

Ala 

Leu 

He 

Gly 

Asp 

Pro 

lie 

lie 

Gly 

He 

Gly 



275 





280 





285 






cac 

aac 

ege 

aeg 

gtc 

gcc 

tgg 

age 

cac 

acc 

gtc 

tee 

acc 

gcc 

ege 

egg 

2110 

His 

Asn 

Arg 

Thr 

Val 

Ala 

Trp 

Ser 

His 

Thr 

Val 

Ser 

Thr 

Ala 

Arg 

Arg 


290 





295 





300 





305 


ttc 

gtg 

tgg 

cac 

ege 

ctg 

age 

etc 

gtg 

ccc 

ggc 

gac 

ccc 

acc 

tee 

tat 

2158 

Phe 

Val 

Trp 

His 

Arg 

Leu 

Ser 

Leu 

Val 

Pro 

Gly 

Asp 

Pro 

Thr 

Ser 

Tyr 






310 





315 





320 



tac 

gtc 

gac 

ggc 

egg 

ccc 

gag 

egg 

atg 

ege 

gcc 

ege 

aeg 

gtc 

aeg 

gtc 

2206 

Tyr 

Val 

Asp 

Gly 

Arg 

Pro 

Glu 

Arg 

Met 

Arg 

Ala 

Arg 

Thr 

Val 

Thr 

Val 





325 





330 





335 




cag 

acc 

ggc 

age 

ggc 

ccg 

gtc 

age 

ege 

acc 

ttc 

cac 

gac 

acc 

ege 

tac 

2254 

Gin 

Thr 

Gly 

Ser 

Gly 

Pro 

Val 

Ser 

Arg 

Thr 

Phe 

His 

Asp 

Thr 

Arg 

Tyr 




340 





345 





350 
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ggc 

ccg 

gtg 

gee 

gtg 

atg 

ccg 

ggc 

acc 

ttc 

gac 

tgg 

aeg 

ccg 

gee 

acc 

2302 

Gly Pro 

Val 

Ala 

Val 

Met 

Pro 

Gly 

Thr 

Phe 

Asp 

Trp 

Thr 

Pro 

Ala 

Thr 



355 





360 





365 






gcg 

tac 

gee 

ate 

acc 

gac 

gtc 

aac 

gcg 

ggc 

aac 

aac 

ege 

gee 

ttc 

gac 

2350 

Ala 

Tyr 

Ala 

He 

Thr 

Asp 

Val 

Asn 

Ala 

Gly 

Asn 

Asn 

Arg 

Ala 

Phe 

Asp 


370 





375 





380 





385 


ggg 

tgg 

ctg 

egg 

atg 

ggc 

cag 

gee 

aag 

gac 

gtc 

egg 

gcg 

etc 

aag 

gcg 

2398 

Gly Trp 

Leu 

Arg 

Met 

Gly 

Gin 

Ala 

Lys 

Asp 

Val 

Arg 

Ala 

Leu 

Lys 

Ala 






390 





395 





400 



gtc 

etc 

gac 

egg 

cac 

cag 

ttc 

ctg 

ccc 

tgg 

gtc 

aac 

gtg 

ate 

gee 

gee 

2446 

Val 

Leu 

Asp 

Arg 

His 

Gin 

Phe 

Leu 

Pro 

Trp 

Val 

Asn 

Val 

He 

Ala 

Ala 





405 





410 





415 




gac 

gcg 

egg 

ggc 

gag 

gee 

etc 

tac 

ggc 

gat 

cat 

teg 

gtc 

gtc 

ccc 

egg 

2494 

Asp 

Ala 

Arg 

Gly 

Glu 

Ala 

Leu 

Tyr 

Gly 

Asp 

His 

Ser 

Val 

Val 

Pro 

Arg 




420 





425 





430 





gtg 

acc 

ggc 

gcg 

etc 

get 

gee 

gee 

tgc 

ate 

ccg 

gcg 

ccg 

ttc 

cag 

ccg 

2542 

Val 

Thr 

Gly 

Ala 

Leu 

Ala 

Ala 

Ala 

Cys 

He 

Pro 

Ala 

Pro 

Phe 

Gin 

Pro 



435 





440 





445 






etc 

tac 

gee 

tee 

age 

ggc 

cag 

gcg 

gtc 

ctg 

gac 

ggt 

tee 

egg 

teg 

gac 

2590 

Leu 

Tyr 

Ala 

Ser 

Ser 

Gly 

Gin 

Ala 

Val 

Leu 

Asp 

Gly 

Ser 

Arg 

Ser 

Asp 


450 





455 





460 





465 


tgc 

gcg 

etc 

ggc 

gee 

gac 

ccc 

gac 

gee 

gcg 

gtc 

ccg 

ggc 

att 

etc 

ggc 

2638 

Cys 

Ala 

Leu 

Gly 

Ala 

Asp 

Pro 

Asp 

Ala 

Ala 

Val 

Pro 

Gly 

He 

Leu 

Gly 






470 





475 





480 



ccg 

gcg 

age 

ctg 

ccg 

gtg 

egg 

ttc 

ege 

gac 

gac 

tac 

gtc 

acc 

aac 

tee 

2686 

Pro 

Ala 

Ser 

Leu 

Pro 

Val 

Arg 

Phe 

Arg 

Asp 

Asp 

Tyr 

Val 

Thr 

Asn 

Ser 





485 





490 





495 




aac 

gac 

agt 

cac 

tgg 

ctg 

gee 

age 

ccg 

gee 

gee 

ccg 

ctg 

gaa 

ggc 

ttc 

2734 

Asn 

Asp 

Ser 

His 

Trp 

Leu 

Ala 

Ser 

Pro 

Ala 

Ala 

Pro 

Leu 

Glu 

Gly 

Phe 




500 





505 





510 





ccg 

egg 

ate 

etc 

ggc 

aac 

gaa 

ege 

acc 

ccg 

ege 

age 

ctg 

ege 

acc 

egg 

2782 

Pro 

Arg 

He 

Leu 

Gly 

Asn 

Glu 

Arg 

Thr 

Pro 

Arg 

Ser 

Leu 

Arg 

Thr 

Arg 



515 





520 





525 






etc 

ggg 

ctg 

gac 

cag 

ate 

cag 

cag 

ege 

etc 

gee 

ggc 

aeg 

gac 

ggt 

ctg 

2830 

Leu 

Gly 

Leu 

Asp 

Gin 

He 

Gin 

Gin 

Arg 

Leu 

Ala 

Gly 

Thr 

Asp 

Gly 

Leu 


530 





535 





540 





545 


CCG 

ggc 

aag 

ggc 

ttc 

acc 

acc 

gee 

egg 

etc 

tgg 

cag 

gtc 

atg 

ttc 

ggc 

2878 

Pro 

Gly 

Lys 

Gly 

Phe 

Thr 

Thr 

Ala 

Arg 

Leu 

Trp 

Gin 

Val 

Met 

Phe 

Gly 






550 





555 





560 



aac 

egg 

atg 

cac 

ggc 

gee 

gaa 

etc 

gee 

ege 

gac 

gac 

ctg 

gtc 

gcg 

etc 

2926 

Asn 

Arg 

Met 

His 

Gly 

Ala 

Glu 

Leu 

Ala 

Arg 

Asp 

Asp 

Leu 

Val 

Ala 

Leu 





565 





570 





575 




tgc 

ege 

ege 

cag 

ccg 

acc 

gcg 

acc 

gee 

teg 

aac 

ggc 

gcg 

ate 

gtc 

gac 

2974 

Cys 

Arg 

Arg 

Gin 

Pro 

Thr 

Ala 

Thr 

Ala 

Ser 

Asn 

Gly 

Ala 

He 

Val 

Asp 




580 





585 





590 





etc 

acc 

gcg 

gee 

tgc 

aeg 

gcg 

ctg 

tee 

ege 

ttc 

gat 

gag 

cgt 

gee 

gac 

3022 

Leu 

Thr 

Ala 

Ala 

Cys 

Thr 

Ala 

Leu 

Ser 

Arg 

Phe 

Asp 

Glu 

Arg 

Ala 

Asp 



595 





600 





605 






ctg 

gac 

age 

egg 

ggc 

gcg 

cac 

ctg 

ttc 

acc 

gag 

ttc 

gee 

etc 

gcg 

ggc 

3070 

Leu 

Asp 

Ser 

Arg 

Gly 

Ala 

His 

Leu 

Phe 

Thr 

Glu 

Phe 

Ala 

Leu 

Ala 

Gly 


610 





615 





620 





625 


gga 

ate 

agg 

ttc 

gee 

gac 

acc 

ttc 

gag 

gtg 

acc 

gat 

ccg 

gta 

ege 

acc 

3118 

Gly 

He 

Arg 

Phe 

Ala 

Asp 

Thr 

Phe 

Glu 

Val 

Thr 

Asp 

Pro 

Val 

Arg 

Thr 






630 





635 





640 



ccg 

ege 

cgt 

ctg 

aac 

acc 

aeg 

gat 

ccg 

egg 

gta 

egg 

aeg 

gcg 

etc 

gee 

3166 

Pro 

Arg 

Arg 

Leu 

Asn 

Thr 

Thr 

Asp 

Pro 

Arg 

Val 

Arg 

Thr 

Ala 

Leu 

Ala 





645 





650 





655 




gac 

gee 

gtg 

caa 

egg 

etc 

gee 

ggc 

ate 

ccc 

etc 

gac 

gcg 

aag 

ctg 

gga 

3214 

Asp 

Ala 

Val 

Gin 

Arg 

Leu 

Ala 

Gly 

He 

Pro 

Leu 

Asp 

Ala 

Lys 

Leu 

Gly 
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660 





665 





670 





gac 

ate 

cac 

acc 

gac 

age 

ege 

ggc 

gaa 

egg 

ege 

ate 

ccc 

ate 

cac 

ggt 

3262 

Asp 

He 

His 

Thr 

Asp 

Ser 

Arg 

Gly 

Glu 

Arg 

Arg 

lie 

Pro 

He 

His 

Gly 



675 





680 





685 






ggc 

ege 

ggg 

gaa 

gca 

ggc 

acc 

ttc 

aac 

gtg 

ate 

acc 

aac 

ccg 

etc 

gtg 

3310 

Gly Arg 

Gly 

Glu 

Ala 

Gly 

Thr 

Phe 

Asn 

Val 

He 

Thr 

Asn 

Pro 

Leu 

Val 


690 





695 





700 





705 


ccg 

ggc 

gtg 

gga 

tac 

ccg 

cag 

gtc 

gtc 

cac 

gga 

aca 

teg 

ttc 

gtg 

atg 

3358 

Pro 

Gly 

Val 

Gly 

Tyr 

Pro 

Gin 

Val 

Val 

His 

Gly 

Thr 

Ser 

Phe 

Val 

Met 






710 





715 





720 



gcc 

gtc 

gaa 

etc 

ggc 

ccg 

cac 

ggc 

ccg 

teg 

gga 

egg 

cag 

ate 

etc 

acc 

3406 

Ala 

Val 

Glu 

Leu 

Gly 

Pro 

His 

Gly 

Pro 

Ser 

Gly 

Arg 

Gin 

He 

Leu 

Thr 





725 





730 





735 




tat 

geg 

cag 

teg 

aeg 

aac 

ccg 

aac 

tea 

ccc 

tgg 

tac 

gcc 

gac 

cag 

acc 

3454 

Tyr 

Ala 

Gin 

Ser 

Thr 

Asn 

Pro 

Asn 

Ser 

Pro 

Trp 

Tyr 

Ala 

Asp 

Gin 

Thr 




740 





745 





750 





gtg 

etc 

tac 

teg 

egg 

aag 

ggc 

tgg 

gac 

acc 

ate 

aag 

tac 

acc 

gag 

geg 

3502 

Val 

Leu 

Tyr 

Ser 

Arg 

Lys 

Gly 

Trp 

Asp 

Thr 

He 

Lys 

Tyr 

Thr 

Glu 

Ala 



755 





760 





765 






cag 

ate 

geg 

gcc 

gac 

ccg 

aac 

ctg 

ege 

gtc 

tac 

egg 

gtg 

gca 

cag 

egg 

3550 

Gin 

He 

Ala 

Ala 

Asp 

Pro 

Asn 

Leu 

Arg 

Val 

Tyr 

Arg 

Val 

Ala 

Gin 

Arg 


770 





775 





780 





785 



gga ege tgacccacgt cacgccggct cggcccgtgc gggggcgcag ggcgccgatc 
Gly Arg 

3606 

gtctctgcat 

cgccggtcag 

ccggggcctg 

cgtcgaccgg 

cggcggccgg 

tcgacgcccg 

3666 

cgtcccggcg 

cagcgactgg 

ctgaagcgcc 

aggegtegge 

ggcccggggc 

aggttgttga 

3726 

acatcaegta 

cgccgggccg 

ccgtcgagga 

tgeeggegag 

gtgtgccagc 

tcggcatccg 

3786 

tgtacacatg 

ccgggcgccg 

gtgatgccgt 

gcagccggta 

ataggccatc 

ggegteagae 

3846 

tgcggcgcag 

gaacgggtcg 

gcggcgtggg 

tcaggtccag 

ctcctggcac 

aagccctcga 

3906 

ccacctcgtc 

cggccacggg 

ccgcgcggct 

cccacaacag 

ccggacaccg 

gccggccggc 

3966 

gcgctcgggc 

gcagaactea 

cgcagtcgcg 

cgatggcggg 

ttcggtcggc 

cggaaactcg 

4026 

ccgggcactg 

cag 





4039 


<210> SEQ ID NO 8 
<211> LENGTH: 787 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence :Actinoplanes 
utahensis echinocandin B (ECB) deacylase protein mutant M-15 
transcribed from gene created by in vitro heteroduplex formation 
followed by in vivo repair 

<400> SEQUENCE: 8 


Val 

Thr 

Ser 

Ser 

Tyr 

Met 

Arg 

Leu 

Lys 

Ala 

Ala 

Ala 

He 

Ala 

Phe 

Gly 

1 




5 





10 





15 


Val 

He 

Val 

Ala 

Thr 

Ala 

Ala 

Val 

Pro 

Ser 

Pro 

Ala 

Ser 

Gly 

Arg 

Glu 




20 





25 





30 



His 

Asp 

Gly 

Gly 

Tyr 

Ala 

Ala 

Leu 

He 

Arg 

Arg 

Ala 

Ser 

Tyr 

Gly 

Val 



35 





40 





45 




Pro 

His 

He 

Thr 

Ala 

Asp 

Asp 

Phe 

Gly 

Ser 

Leu 

Gly 

Phe 

Gly 

Val 

Gly 


50 





55 





60 





Tyr 

Val 

Gin 

Ala 

Glu 

Asp 

Asn 

He 

Cys 

Val 

He 

Ala 

Glu 

Ser 

Val 

Val 

65 





70 





75 





80 

Thr 

Ala 

Asn 

Gly 

Glu 

Arg 

Ser 

Arg 

Trp 

Phe 

Gly 

Ala 

Thr 

Gly 

Pro 

Asp 





85 





90 





95 
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Asp 

Ala 

Asp 

Val 

Arg 

Ser 

Asp 

Leu 

Phe 

His 

Arg 

Lys 

Ala 

He 

Asp 

Asp 




100 





105 





110 



Arg 

Val 

Ala 

Glu 

Arg 

Leu 

Leu 

Glu 

Gly 

Pro 

Arg 

Asp 

Gly 

Val 

Arg 

Ala 



115 





120 





125 




Pro 

Ser 

Asp 

Asp 

Val 

Arg 

Asp 

Gin 

Met 

Arg 

Gly 

Phe 

Val 

Ala 

Gly 

Tyr 


130 





135 





140 





Asn 

His 

Phe 

Leu 

Arg 

Arg 

Thr 

Gly 

Val 

His 

Arg 

Leu 

Thr 

Asp 

Pro 

Ala 

145 





150 





155 





160 

Cys 

Arg 

Gly 

Lys 

Ala 

Trp 

Val 

Arg 

Pro 

Leu 

Ser 

Glu 

He 

Asp 

Leu 

Trp 





165 





170 





175 


Arg 

Thr 

Ser 

Trp 

Asp 

Ser 

Met 

Val 

Arg 

Ala 

Gly 

Ser 

Gly 

Ala 

Leu 

Leu 




180 





185 





190 



Asp Gly 

He 

Val 

Ala 

Ala 

Thr 

Pro 

Pro 

Thr 

Ala 

Ala 

Gly 

Pro 

Ala 

Ser 



195 





200 





205 




Ala 

Pro 

Glu 

Ala 

Pro 

Asp 

Ala 

Ala 

Ala 

He 

Ala 

Ala 

Ala 

Leu 

Asp 

Gly 


210 





215 





220 





Thr 

Ser 

Ala 

Gly 

He 

Gly 

Ser 

Asn 

Ala 

Tyr 

Gly 

Leu 

Gly 

Ala 

Gin 

Ala 

225 





230 





235 





240 

Thr 

Val 

Asn 

Gly 

Ser 

Gly 

Met 

Val 

Leu 

Ala 

Asn 

Pro 

His 

Phe 

Pro 

Trp 





245 





250 





255 


Gin Gly 

Ala 

Ala 

Arg 

Phe 

Tyr 

Arg 

Met 

His 

Leu 

Lys 

Val 

Pro 

Gly 

Arg 




260 





265 





270 



Tyr Asp 

Val 

Glu 

Gly 

Ala 

Ala 

Leu 

He 

Gly 

Asp 

Pro 

He 

He 

Gly 

He 



275 





280 





285 




Gly His 

Asn 

Arg 

Thr 

Val 

Ala 

Trp 

Ser 

His 

Thr 

Val 

Ser 

Thr 

Ala 

Arg 


290 





295 





300 





Arg 

Phe 

Val 

Trp 

His 

Arg 

Leu 

Ser 

Leu 

Val 

Pro 

Gly 

Asp 

Pro 

Thr 

Ser 

305 





310 





315 





320 

Tyr 

Tyr 

Val 

Asp 

Gly 

Arg 

Pro 

Glu 

Arg 

Met 

Arg 

Ala 

Arg 

Thr 

Val 

Thr 





325 





330 





335 


Val 

Gin 

Thr 

Gly 

Ser 

Gly 

Pro 

Val 

Ser 

Arg 

Thr 

Phe 

His 

Asp 

Thr 

Arg 




340 





345 





350 



Tyr Gly 

Pro 

Val 

Ala 

Val 

Met 

Pro 

Gly 

Thr 

Phe 

Asp 

Trp 

Thr 

Pro 

Ala 



355 





360 





365 




Thr 

Ala 

Tyr 

Ala 

He 

Thr 

Asp 

Val 

Asn 

Ala 

Gly 

Asn 

Asn 

Arg 

Ala 

Phe 


370 





375 





380 





Asp Gly 

Trp 

Leu 

Arg 

Met 

Gly 

Gin 

Ala 

Lys 

Asp 

Val 

Arg 

Ala 

Leu 

Lys 

385 





390 





395 





400 

Ala 

Val 

Leu 

Asp 

Arg 

His 

Gin 

Phe 

Leu 

Pro 

Trp 

Val 

Asn 

Val 

He 

Ala 





405 





410 





415 


Ala 

Asp 

Ala 

Arg 

Gly 

Glu 

Ala 

Leu 

Tyr 

Gly 

Asp 

His 

Ser 

Val 

Val 

Pro 




420 





425 





430 



Arg 

Val 

Thr 

Gly 

Ala 

Leu 

Ala 

Ala 

Ala 

Cys 

He 

Pro 

Ala 

Pro 

Phe 

Gin 



435 





440 





445 




Pro 

Leu 

Tyr 

Ala 

Ser 

Ser 

Gly 

Gin 

Ala 

Val 

Leu 

Asp 

Gly 

Ser 

Arg 

Ser 


450 





455 





460 





Asp 

Cys 

Ala 

Leu 

Gly 

Ala 

Asp 

Pro 

Asp 

Ala 

Ala 

Val 

Pro 

Gly 

He 

Leu 

465 





470 





475 





480 

Gly Pro 

Ala 

Ser 

Leu 

Pro 

Val 

Arg 

Phe 

Arg 

Asp 

Asp 

Tyr 

Val 

Thr 

Asn 





485 





490 





495 


Ser 

Asn 

Asp 

Ser 

His 

Trp 

Leu 

Ala 

Ser 

Pro 

Ala 

Ala 

Pro 

Leu 

Glu 

Gly 




500 





505 





510 






49 


US 6,537,746 B2 


50 


-continued 


Phe 

Pro 

Arg 

He 

Leu 

Gly 

Asn 

Glu 

Arg 

Thr 

Pro 

Arg 

Ser 

Leu 

Arg 

Thr 



515 





520 





525 




Arg 

Leu 

Gly 

Leu 

Asp 

Gin 

He 

Gin 

Gin 

Arg 

Leu 

Ala 

Gly 

Thr 

Asp 

Gly 


530 





535 





540 





Leu 

Pro 

Gly 

Lys 

Gly 

Phe 

Thr 

Thr 

Ala 

Arg 

Leu 

Trp 

Gin 

Val 

Met 

Phe 

545 





550 





555 





560 

Gly Asn 

Arg 

Met 

His 

Gly 

Ala 

Glu 

Leu 

Ala 

Arg 

Asp 

Asp 

Leu 

Val 

Ala 





565 





570 





575 


Leu 

Cys 

Arg 

Arg 

Gin 

Pro 

Thr 

Ala 

Thr 

Ala 

Ser 

Asn 

Gly 

Ala 

He 

Val 




580 





585 





590 



Asp 

Leu 

Thr 

Ala 

Ala 

Cys 

Thr 

Ala 

Leu 

Ser 

Arg 

Phe 

Asp 

Glu 

Arg 

Ala 



595 





600 





605 




Asp 

Leu 

Asp 

Ser 

Arg 

Gly 

Ala 

His 

Leu 

Phe 

Thr 

Glu 

Phe 

Ala 

Leu 

Ala 


610 





615 





620 





Gly Gly 

He 

Arg 

Phe 

Ala 

Asp 

Thr 

Phe 

Glu 

Val 

Thr 

Asp 

Pro 

Val 

Arg 

625 





630 





635 





640 

Thr 

Pro 

Arg 

Arg 

Leu 

Asn 

Thr 

Thr 

Asp 

Pro 

Arg 

Val 

Arg 

Thr 

Ala 

Leu 





645 





650 





655 


Ala 

Asp 

Ala 

Val 

Gin 

Arg 

Leu 

Ala 

Gly 

He 

Pro 

Leu 

Asp 

Ala 

Lys 

Leu 




660 





665 





670 



Gly Asp 

He 

His 

Thr 

Asp 

Ser 

Arg 

Gly 

Glu 

Arg 

Arg 

He 

Pro 

He 

His 



675 





680 





685 




Gly Gly 

Arg 

Gly 

Glu 

Ala 

Gly 

Thr 

Phe 

Asn 

Val 

He 

Thr 

Asn 

Pro 

Leu 


690 





695 





700 





Val 

Pro 

Gly 

Val 

Gly 

Tyr 

Pro 

Gin 

Val 

Val 

His 

Gly 

Thr 

Ser 

Phe 

Val 

705 





710 





715 





720 

Met 

Ala 

Val 

Glu 

Leu 

Gly 

Pro 

His 

Gly 

Pro 

Ser 

Gly 

Arg 

Gin 

He 

Leu 





725 





730 





735 


Thr 

Tyr 

Ala 

Gin 

Ser 

Thr 

Asn 

Pro 

Asn 

Ser 

Pro 

Trp 

Tyr 

Ala 

Asp 

Gin 




740 





745 





750 



Thr 

Val 

Leu 

Tyr 

Ser 

Arg 

Lys 

Gly 

Trp 

Asp 

Thr 

He 

Lys 

Tyr 

Thr 

Glu 



755 





760 





765 




Ala 

Gin 

He 

Ala 

Ala 

Asp 

Pro 

Asn 

Leu 

Arg 

Val 

Tyr 

Arg 

Val 

Ala 

Gin 


770 





775 





780 






Arg Gly Arg 
785 


<210> SEQ ID NO 9 

<211> LENGTH: 21 

<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 

<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence : forward 
primer corresponding to the vector sequence of pGFP 
plasmid (Aequorea victoria green fluorescent 
protein) 

<400> SEQUENCE: 9 

ccgactggaa agcgggcagt g 21 


<210> SEQ ID NO 10 
<211> LENGTH: 22 
<212> TYPE: DNA 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<223> OTHER INFORMATION: Description of Artificial Sequence : reverse 
primer corresponding to the vector sequence of pGFP 
plasmid (Aequorea victoria green fluorescent 
protein) 
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-continued 


<400> SEQUENCE: 10 

cggggctggc ttaactatgc gg 22 


<210> SEQ ID NO 11 
<211> LENGTH: 4 
<212> TYPE: PRT 

<213> ORGANISM: Artificial Sequence 
<220> FEATURE: 

<22 1> NAME /KEY: MOD_RES 
<222> LOCATION: (1) 

<223> OTHER INFORMATION: Xaa = succinyl-Ala 
<22 1> NAME /KEY: MOD_RES 
<222> LOCATION: (4) 

<223> OTHER INFORMATION: Xaa = Phe-p-nitroanilide 

<223> OTHER INFORMATION: Description of Artificial Sequence :Bacillus 
subtilis subtilisin E thermostability assay 
substrate 

<400> SEQUENCE: 11 

Xaa Ala Pro Xaa 


What is claimed is: 

1. A method for evolving a polynucleotide toward acqui- 
sition of a desired functional property, comprising 

(a) incubating a population of parental polynucleotide 
variants having sufficient diversity that recombination 
between the parental polynucleotide variants can gen- 
erate more recombinated-polynucleotides than there 
are parental polynucleotide variants under conditions to 
generate annealed polynucleotides comprising hetero- 
duplexes; 

(b) exposing the heteroduplexes to one or more enzymes 
of a DNA repair system in vitro to convert the hetero- 
duplexes to parental polynucleotide variants or recom- 
bined polynucleotide variants; 

(c) screening or selecting the recombined polynucleotide 
variants for the desired functional property. 

2. The method of claim 1, wherein the DNA repair system 
comprises cellular extracts. 

3. The method of claim 1, wherein the cells are bacterial 
cells. 

4. The method of claim 1 further comprising introducing 
the products of step (b) into cells. 

5. The method of claim 4, wherein the introducing step 
selects for transformed cells receiving recombinant poly- 
nucleotides resulting from resolution of heteroduplexes in 
step (b) relative to transformed cells receiving polynucle- 
otides resulting from resolution of homoduplexes in step (b). 

6. A method for evolving a polynucleotide toward acqui- 
sition of a desired functional property, comprising 

(a) incubating a population of parental polynucleotide 
variants having sufficient diversity that recombination 
between the parental polynucleotide variants can gen- 
erate more recombined polynucleotides than there are 
parental polynucleotide variants under conditions to 
generate annealed polynucleotides comprising hetero- 
duplexes; 

(b) introducing the annealed polynucleotides into cells 
having a DNA repair system and propagating the cells 
under conditions to select for cells receiving heterodu- 
plexes relative to cells receiving homoduplexes, and to 
convert the heteroduplexes to parental polynucleotide 
variants or recombined polynucleotide variants; 


(c) screening or selecting the recombined polynucleotide 
variants for the desired functional property. 

7. The method of claim 6, wherein the heteroduplexes are 
exposed to the cellular DNA repair system in vitro. 

8. A method for evolving a polynucleotide toward acqui- 
sition of a desired functional property, comprising 

(a) incubating first and second pools of parental poly- 
nucleotide variants having sufficient diversity that 
recombination between the parental polynucleotide 
variants can generate more recombined polynucle- 
otides than there are parental polynucleotide variants 
under conditions whereby a strand from any polynucle- 
otide variant in the first pool can anneal with a strand 
from any polynucleotide in the second pool to generate 
annealed polynucleotides comprising heteroduplexes; 

(b) exposing the heteroduplexes to a DNA repair system 
to convert the heteroduplexes to parental polynucle- 
otide variants or recombined polynucleotide variants; 

(c) screening or selecting the recombined polynucleotide 
variants for the desired functional property. 

9. The method of claim 8, further comprising introducing 
the heteroduplexes into cells, whereby the heteroduplexes 
are exposed to the DNA repair system of the cells in vivo. 

10. The method of claim 9, wherein the annealed poly- 
nucleotides further comprise homoduplexes and the intro- 
ducing step selects for transformed cells receiving hetero- 
duplexes relative to transformed cells receiving 
homoduplexes. 

11. The method of claim 10, 6, or 5, wherein a first 
polynucleotide variant is provided as a component of a first 
vector, and a second polynucleotide variant is provided as a 
component of a second vector, and the method further 
comprises converting the first and second vectors to linear- 
ized forms in which the first and second polynucleotide 
variants occur at opposite ends, whereby in the incubating 
step single-stranded forms of the first linearized vector 
reanneal with each other to form linear first vector, single- 
stranded forms of the second linearized vector reanneal with 
each other to form linear second vector, and single -stranded 
linearized forms of the first and second vectors anneal with 
each to form a circular heteroduplex bearing a nick in each 
strand, and the introducing step selects for transformed cells 
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receiving the circular heteroduplexes or recombinant poly- 
nucleotides derived therefrom relative to the linear first and 
second vector. 

12. The method of claim 11, wherein the first and second 

vectors are converted to linearized forms by PCR. 5 

13. The method of claim 11, wherein the first and second 
vectors are converted to linearized forms by digestion with 
first and second restriction enzymes. 

14. The method of claim 10, 6 or 5, wherein the popula- 

tion of polynucleotides comprises first and second poly- 10 
nucleotides provided in double stranded form, and the 
method further comprises incorporating the fist and second 
polynucleotides as components of first and second vectors, 
whereby the first and second polynucleotides occupy oppo- 
site ends of the first and second vectors, whereby in the 15 
incubating step single-stranded forms of the first linearized 
vector reanneal with each other to form linear first vector, 
single-stranded forms of the second linearized vector re an- 
neal with each other to form linear second vector, and 
single-stranded linearized forms of the first and second 20 
vectors anneal with each to form a circular heteroduplex 
bearing a nick in each strand, and the introducing step 
selects for transformed cells receiving the circular hetero- 
duplexes or recombinant polynucleotides derived therefrom 
relative to the linear first and second vector. 25 

15. The method of claim 10, 6 or 5, further comprising 
sealing nicks in the heteroduplexes to form covalently- 
closed circular heteroduplexes before the introducing step. 

16. The method of claim 1, 6 or 8, wherein the population 

of polynucleotide variants are provided in double stranded 30 
form, and the method further comprising converting the 
double stranded polynucleotides to single stranded poly- 
nucleotides before the annealing step. 

17. The method of claim 1, 6 or 8 wherein the converting 

step comprises: 35 

conducting asymmetric amplification of the first and 
second double stranded polynucleotide variants to 
amplify a first strand of the first polynucleotide variant, 
and a second strand of the second polynucleotide 
variant, whereby the first and second strands anneal in 
the incubating step to form a heteroduplex. 

18. The method of claim 17, wherein the first and second 

double-stranded polynucleotide variants are provided in 
vector-free form, and the method further comprises incor- 
porating the heteroduplex into a vector. 45 

19. The method of claim 18, wherein the first and second 
polynucleotides are from chromosomal DNA. 

20. The method of claim 1, 6 or 8, further comprising 
repeating steps (a)-(c) whereby the incubating step in a 
subsequent cycle is performed on recombinant variants from 
a previous cycle. 

21. The method of claim 1, 6 or 8, wherein the polynucle- 
otide variants encode a polypeptide. 

22. The method of claim 1, 6 or 8, wherein the population 
of polynucleotide variants comprises at least 20 variants. 

23. The method of claim 1, 6 or 8, wherein the population 
of polynucleotide variants are at least 10 kb in length. 

24. The method of claim 1, 6 or 8, wherein the population 
of polynucleotide variants comprises natural variants. 
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25. The method of claim 1, 6 or 8, wherein the population 
of polynucleotides comprises variants generated by 
mutagenic PCR. 

26. The method of claim 1, 6 or 8, wherein the population 
of polynucleotide variants comprises variants generated by 
site directed mutagenesis. 

27. The method of claim 1, 6 or 8, further comprising at 
least partially demethyl ating the population of variant poly- 
nucleotides. 

28. The method of claim 27, whether the at least partially 
demethylating step is performed by PCR amplification of the 
population of variant polynucleotides. 

29. The method of claim 27, wherein the at least partially 
demethylating step is performed by amplification of the 
population of variant polynucleotides in host cells. 

30. The method of claim 29, wherein the host cells are 
defective in a gene encoding a methylase enzyme. 

31. The method of claim 27, wherein the population of 
variant polynucleotides are double stranded polynucleotides 
and only one strand of each polynucleotide is at least 
partially demethylated. 

32. The method of claim 1, 6 or 8, wherein the population 
of variant polynucleotide variants comprises at least 5 
polynucleotides having at least 90% sequence identity with 
one another. 

33. The method of claim 1, 6 or 8, further comprising 
isolating a screened recombinant variant. 

34. Ifie method of claim 33, further comprising express- 
ing a screened recombinant variant to produce a recombi- 
nant protein. 

35. The method of claim 34, further comprising formu- 
lating the recombinant protein with a carrier to form a 
pharmaceutical composition. 

36. The method of claim 1, 6 or 8, wherein the polynucle- 
otide variants encode enzymes selected from the group 
consisting of proteases, lipases, amylases, cutinases, 
cellulases, amylases, oxidases, peroxidases and phytases. 

37. The method of claim 1, 6 or 8, wherein the polynucle- 
otide variants encode a polypeptide selected from the group 
consisting of insulin, ACTH, glucagon, somatostatin, 
somatotropin, thymosin, parathyroid hormone, pigmentary 
hormones, somatomedin, erythropoietin, luteinizing 
hormone, chorionic gonadotropin, hyperthalnic releasing 
factors, antidiuretic hormones, thyroid stimulating hormone, 
relaxin, interferon, thrombopoietin (TPO), and prolactin. 

38. The method of claim 1, 6 or 8, wherein the polynucle- 
otide variants encode a plurality of enzymes forming a 
metabolic pathway. 

39. The method of claim 1, 6 or 8, wherein the polynucle- 
otide variants are in concatemeric form. 

40. The method of claim 39, wherein the functional 
property is an enzymatic activity. 

41. The method of claim 1, 6 or 8, wherein the at least two 
polynucleotide variants differ at between 0.1-25% of posi- 
tions. 

42. The method of claim 1, 6 or 8, wherein the functional 
property is an enzymatic activity. 

sj< sj< }j< sH 
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